Closed a-l-e-x-d-s-9 closed 1 year ago
You need to recompile/reinstall bitsnbytes (it was probably compiled for a different CUDA and/or your cuda path variables aren't set so it doesn't know where to look).
@Thomas-MMJ Thanks for the suggestion. I activated venv, uninstalled: "pip3 uninstall bitsandbytes" it was version 0.35.0, then "pip3 install bitsandbytes" version 0.35.4. It did not solve the problem. Also I opened a bug regarding other problem I mentioned: #535. I'm not sure how exactly python virtual environment for webui is coexisting with my system, regarding CUDA versions. From what I see my system using CUDA 12, but virtual environment is using libraries for CUDA 11 or 10. Should webui venv use CUDA version 12, or is it limited to 10/11 version due to pytorch or other essential component limitation? Can you explain how can I fix "cuda path variables"? Should I upgrade other python packages related to CUDA, or it may break my whole venv for webui?
you might have to go to the bitsandbytes and read their directions. But yes you might have to downgrade from CUDA 12 to CUDA 11.8 till they do an update (11.x are backwards compatible, and 12 is too, but most repositories aren't upgraded to check for CUDA 12.0 yet)
https://github.com/TimDettmers/bitsandbytes
https://github.com/TimDettmers/bitsandbytes/blob/main/compile_from_source.md
I installed the package: cuda-11.8. I still need to use: "export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/resolve/libs/". Now there is no "Exception importing 8bit adam: libcurand.so.10" exception anymore. @Thomas-MMJ Thanks for helping.
All the information provided:
Have you read the Readme? Yes Have you completely restarted the stable-diffusion-webUI, not just reloaded the UI? Yes Have you updated Dreambooth to the latest revision? Yes Have you updated the Stable-Diffusion-WebUI to the latest version? Yes No, really. Please save us both some trouble and update the SD-WebUI and Extension and restart before posting this. Reply 'OK' Below to acknowledge that you did this. OK Describe the bug When starting training I see the following exception:
Despite the exception, the training seems to be working, although I'm not sure that the results not corrupted by this exception. In the UI it looks that everything is working fine, training ends and produces ckpt. From what I understand libcurand.so.10 is library for CUDA 10, and I have CUDA 12 installed on my system. Not sure what CUDA python virtual environment for webui is using. Also I have to run this line before: "export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/resolve/libs/" otherwise I'm getting the exception "NameError: name 'str2optimizer8bit_blockwise' is not defined" when starting training. Not sure if the export is proper solution. Is it something that need to be added to requirement? Or symbolic link that missing? I'm not sure, please help.
Provide logs
Environment What OS? Ubuntu 22.04 Linux Venus 5.15.0-56-generic #62-Ubuntu SMP Tue Nov 22 19:54:14 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux If Windows - WSL or native? No What GPU are you using? RTX 2080 8GB Driver Version: 525.60.13 CUDA Version: 12.0 Screenshots/Config If the issue is specific to an error while training, please provide a screenshot of training parameters or the db_config.json file from /models/dreambooth/MODELNAME/db_config.json db_config.json.log