152334H / DL-Art-School

TorToiSe fine-tuning with DLAS
GNU Affero General Public License v3.0
205 stars 86 forks source link

The addition of 'bitsandbytes' may have broke training #35

Closed absane closed 1 year ago

absane commented 1 year ago

Yesterday and this morning I was troubleshooting a training issue, but nonetheless DLAS was working. This afternoon I saw an update in the Windows GUI, so I applied it. Ever since then, I've been running into this issue:

Environment name is set as "DLAS" as per environment.yaml
anaconda3/miniconda3 detected in C:\ProgramData\miniconda3
Starting conda environment "DLAS" from C:\ProgramData\miniconda3
Latest git hash: 43f445d
Disabled distributed training.

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
For effortless bug reporting copy-paste your error into this form: https://docs.google.com/forms/d/e/1FAIpQLScPB8emS3Thkp66nvqwmjTEgxp8Y9ufuWTzFyr9kJ5AoI47dQ/viewform?usp=sf_link
================================================================================
C:\Users\james\.conda\envs\DLAS\lib\site-packages\bitsandbytes\cuda_setup\paths.py:27: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {WindowsPath('C')}
  warn(
C:\Users\james\.conda\envs\DLAS\lib\site-packages\bitsandbytes\cuda_setup\paths.py:93: UserWarning: C:\Users\james\.conda\envs\DLAS did not contain libcudart.so as expected! Searching further paths...
  warn(
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
C:\Users\james\.conda\envs\DLAS\lib\site-packages\bitsandbytes\cuda_setup\paths.py:27: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {WindowsPath('/usr/local/cuda/lib64')}
  warn(
WARNING: No libcudart.so found! Install CUDA or the cudatoolkit package (anaconda)!
CUDA SETUP: Loading binary C:\Users\james\.conda\envs\DLAS\lib\site-packages\bitsandbytes\libbitsandbytes_cpu.so...
Traceback (most recent call last):
  File "C:\Users\james\Desktop\DL-Art-School\codes\train.py", line 386, in <module>
    trainer.init(args.opt, opt, args.launcher)
  File "C:\Users\james\Desktop\DL-Art-School\codes\train.py", line 38, in init
    maybe_bnb.populate()
  File "C:\Users\james\Desktop\DL-Art-School\codes\maybe_bnb.py", line 15, in populate
    import bitsandbytes as bnb
  File "C:\Users\james\.conda\envs\DLAS\lib\site-packages\bitsandbytes\__init__.py", line 6, in <module>
    from .autograd._functions import (
  File "C:\Users\james\.conda\envs\DLAS\lib\site-packages\bitsandbytes\autograd\_functions.py", line 5, in <module>
    import bitsandbytes.functional as F
  File "C:\Users\james\.conda\envs\DLAS\lib\site-packages\bitsandbytes\functional.py", line 13, in <module>
    from .cextension import COMPILED_WITH_CUDA, lib
  File "C:\Users\james\.conda\envs\DLAS\lib\site-packages\bitsandbytes\cextension.py", line 41, in <module>
    lib = CUDALibrary_Singleton.get_instance().lib
  File "C:\Users\james\.conda\envs\DLAS\lib\site-packages\bitsandbytes\cextension.py", line 37, in get_instance
    cls._instance.initialize()
  File "C:\Users\james\.conda\envs\DLAS\lib\site-packages\bitsandbytes\cextension.py", line 31, in initialize
    self.lib = ct.cdll.LoadLibrary(binary_path)
  File "C:\Users\james\.conda\envs\DLAS\lib\ctypes\__init__.py", line 452, in LoadLibrary
    return self._dlltype(name)
  File "C:\Users\james\.conda\envs\DLAS\lib\ctypes\__init__.py", line 364, in __init__
    if '/' in name or '\\' in name:
TypeError: argument of type 'WindowsPath' is not iterable
Press any key to continue . . .

Initially I thought maybe it was CUDA, Miniconda, or Python since I had so many different versions installed and probably broken libraries/packages. I uninstalled everything, started with a clean slate, and I still get this error. The longer I look into it, the more it seems to be related to 'bitsandbytes' given the stack trace and the commit history in the last push showing that it was recently added

Reverting back to a previous commit works:

git checkout 83b901c656447126d5a0877639d394335204e1ac

This is Windows 10, Python 10, CUDA 11.7.

152334H commented 1 year ago

unfortunately windows training is the purview of @devilismyfriend and I cannot really do much about it

He personally said he would be working on it a bit ago, so stay tuned :)

https://github.com/152334H/DL-Art-School/issues/8#issuecomment-1442536426

152334H commented 1 year ago

I also added a temp commit so that casual windows users don't get an error on the latest commit, but 8bit will not be enabled until it happens

absane commented 1 year ago

All good, thank you. Works for now with the roll back, but I did decide to move over to Collab for the majority of the work after I found my 3090 wasn't as fast as Collab. Though, Windows was nice because I was getting tired of all the tabs and terminals I had open ;)

Thank you!

152334H commented 1 year ago

I did decide to move over to Collab for the majority of the work after I found my 3090 wasn't as fast as Collab

This is very weird and shouldn't be happening honestly. I'll test it on my own to check