d8ahazard / sd_dreambooth_extension

Other
1.86k stars 282 forks source link

Exception importing 8bit adam: libcurand.so.10: cannot open shared object file: No such file or directory #534

Closed a-l-e-x-d-s-9 closed 1 year ago

a-l-e-x-d-s-9 commented 1 year ago

All the information provided:

Python 3.10.6 (main, Nov 14 2022, 16:10:14) [GCC 11.3.0]
Commit hash: 685f9631b56ff8bd43bce24ff5ce0f9a0e9af490
Installing requirements for Web UI

#######################################################################################################
Initializing Dreambooth
If submitting an issue on github, please provide the below text for debugging purposes:

Python revision: 3.10.6 (main, Nov 14 2022, 16:10:14) [GCC 11.3.0]
Dreambooth revision: 9e3584f0edd2e64d284b6aaf9580ade5dcceed9d
SD-WebUI revision: 685f9631b56ff8bd43bce24ff5ce0f9a0e9af490

Checking Dreambooth requirements...
[+] bitsandbytes version 0.35.0 installed.
[+] diffusers version 0.10.2 installed.
[+] transformers version 4.25.1 installed.
[+] xformers version 0.0.14.dev0 installed.
[+] torch version 1.12.1+cu113 installed.
[+] torchvision version 0.13.1+cu113 installed.

Have you read the Readme? Yes Have you completely restarted the stable-diffusion-webUI, not just reloaded the UI? Yes Have you updated Dreambooth to the latest revision? Yes Have you updated the Stable-Diffusion-WebUI to the latest version? Yes No, really. Please save us both some trouble and update the SD-WebUI and Extension and restart before posting this. Reply 'OK' Below to acknowledge that you did this. OK Describe the bug When starting training I see the following exception:

CUDA SETUP: CUDA runtime path found: /opt/resolve/libs/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 7.5
CUDA SETUP: Detected CUDA version 110
CUDA SETUP: Loading binary /home/alex/stable-diffusion-webui/venv/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda110.so...
Exception importing 8bit adam: libcurand.so.10: cannot open shared object file: No such file or directory
Traceback (most recent call last):
  File "/home/alex/stable-diffusion-webui/extensions/sd_dreambooth_extension/dreambooth/train_dreambooth.py", line 597, in main
    import bitsandbytes as bnb
  File "/home/alex/stable-diffusion-webui/venv/lib/python3.10/site-packages/bitsandbytes/__init__.py", line 6, in <module>
    from .autograd._functions import (
  File "/home/alex/stable-diffusion-webui/venv/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py", line 5, in <module>
    import bitsandbytes.functional as F
  File "/home/alex/stable-diffusion-webui/venv/lib/python3.10/site-packages/bitsandbytes/functional.py", line 13, in <module>
    from .cextension import COMPILED_WITH_CUDA, lib
  File "/home/alex/stable-diffusion-webui/venv/lib/python3.10/site-packages/bitsandbytes/cextension.py", line 41, in <module>
    lib = CUDALibrary_Singleton.get_instance().lib
  File "/home/alex/stable-diffusion-webui/venv/lib/python3.10/site-packages/bitsandbytes/cextension.py", line 37, in get_instance
    cls._instance.initialize()
  File "/home/alex/stable-diffusion-webui/venv/lib/python3.10/site-packages/bitsandbytes/cextension.py", line 31, in initialize
    self.lib = ct.cdll.LoadLibrary(binary_path)
  File "/usr/lib/python3.10/ctypes/__init__.py", line 452, in LoadLibrary
    return self._dlltype(name)
  File "/usr/lib/python3.10/ctypes/__init__.py", line 374, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: libcurand.so.10: cannot open shared object file: No such file or directory
 Scheduler, EMA Loaded. 
 Allocated: 3.8GB 
 Reserved: 3.9GB

Despite the exception, the training seems to be working, although I'm not sure that the results not corrupted by this exception. In the UI it looks that everything is working fine, training ends and produces ckpt. From what I understand libcurand.so.10 is library for CUDA 10, and I have CUDA 12 installed on my system. Not sure what CUDA python virtual environment for webui is using. Also I have to run this line before: "export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/resolve/libs/" otherwise I'm getting the exception "NameError: name 'str2optimizer8bit_blockwise' is not defined" when starting training. Not sure if the export is proper solution. Is it something that need to be added to requirement? Or symbolic link that missing? I'm not sure, please help.

Provide logs

cd ~/stable-diffusion-webui/ && git pull && ./webui.sh --enable-insecure-extension-access --api --test-lora --ckptfix
Already up to date.

################################################################
Install script for stable-diffusion + Web UI
Tested on Debian 11 (Bullseye)
################################################################

################################################################
Running on alex user
################################################################

################################################################
Repo already cloned, using it as install directory
################################################################

################################################################
Create and activate python venv
################################################################

################################################################
Launching launch.py...
################################################################
Python 3.10.6 (main, Nov 14 2022, 16:10:14) [GCC 11.3.0]
Commit hash: 685f9631b56ff8bd43bce24ff5ce0f9a0e9af490
Installing requirements for Web UI

#######################################################################################################
Initializing Dreambooth
If submitting an issue on github, please provide the below text for debugging purposes:

Python revision: 3.10.6 (main, Nov 14 2022, 16:10:14) [GCC 11.3.0]
Dreambooth revision: 9e3584f0edd2e64d284b6aaf9580ade5dcceed9d
SD-WebUI revision: 685f9631b56ff8bd43bce24ff5ce0f9a0e9af490

Checking Dreambooth requirements...
[+] bitsandbytes version 0.35.0 installed.
[+] diffusers version 0.10.2 installed.
[+] transformers version 4.25.1 installed.
[+] xformers version 0.0.14.dev0 installed.
[+] torch version 1.12.1+cu113 installed.
[+] torchvision version 0.13.1+cu113 installed.
#######################################################################################################

Launching Web UI with arguments: --enable-insecure-extension-access --api --test-lora --ckptfix --listen --xformers --force-enable-xformers
Dreambooth API layer loaded
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
Loading weights [81761151] from /home/alex/stable-diffusion-webui/models/Stable-diffusion/v1-5-pruned-emaonly.ckpt
Loading VAE weights from: /home/alex/stable-diffusion-webui/models/Stable-diffusion/vae-ft-mse-840000-ema-pruned.vae.pt
Applying xformers cross attention optimization.
Model loaded.
Loaded a total of 1 textual inversion embeddings.
Embeddings: bad_prompt
Running on local URL:  http://0.0.0.0:7860

To create a public link, set `share=True` in `launch()`.
Starting Dreambooth training...
Concept 0 class dir is /home/alex/stable-diffusion-webui/models/dreambooth/Dreambooth_test_01_dog/classifiers_0
 Allocated 0.0/2.0GB 
 Reserved: 0.0/2.0GB 

Initializing dreambooth training...
Patching transformers to fix kwargs errors.
/home/alex/stable-diffusion-webui/venv/lib/python3.10/site-packages/transformers/generation_utils.py:24: FutureWarning: Importing `GenerationMixin` from `src/transformers/generation_utils.py` is deprecated and will be removed in Transformers v5. Import as `from transformers import GenerationMixin` instead.
  warnings.warn(
Replace CrossAttention.forward to use xformers
Checking concept: {'max_steps': 10000, 'instance_data_dir': '/home/alex/Documents/Stable Diffusion/Dreambooth/TrainingTest/Dog_01/Source/', 'class_data_dir': '', 'instance_prompt': 'corgidog', 'class_prompt': 'photo of a beautiful dog', 'save_sample_prompt': '', 'save_sample_template': '', 'instance_token': '', 'class_token': '', 'num_class_images': 50, 'class_negative_prompt': 'ugly, disfigured, overlapping, blurred', 'class_guidance_scale': 7.5, 'class_infer_steps': 40, 'save_sample_negative_prompt': '', 'n_save_sample': 1, 'sample_seed': -1, 'save_guidance_scale': 7.5, 'save_infer_steps': 40}
Concept requires 50 images.
Class image dir is not set, defaulting to /home/alex/stable-diffusion-webui/models/dreambooth/Dreambooth_test_01_dog/classifiers_0
Class dir /home/alex/stable-diffusion-webui/models/dreambooth/Dreambooth_test_01_dog/classifiers_0 has 50 images.
 Loaded model. 
 Allocated: 0.0GB 
 Reserved: 0.0GB 

Injecting trainable lora...

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
For effortless bug reporting copy-paste your error into this form: https://docs.google.com/forms/d/e/1FAIpQLScPB8emS3Thkp66nvqwmjTEgxp8Y9ufuWTzFyr9kJ5AoI47dQ/viewform?usp=sf_link
================================================================================
CUDA SETUP: CUDA runtime path found: /opt/resolve/libs/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 7.5
CUDA SETUP: Detected CUDA version 110
CUDA SETUP: Loading binary /home/alex/stable-diffusion-webui/venv/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda110.so...
Exception importing 8bit adam: libcurand.so.10: cannot open shared object file: No such file or directory
Traceback (most recent call last):
  File "/home/alex/stable-diffusion-webui/extensions/sd_dreambooth_extension/dreambooth/train_dreambooth.py", line 597, in main
    import bitsandbytes as bnb
  File "/home/alex/stable-diffusion-webui/venv/lib/python3.10/site-packages/bitsandbytes/__init__.py", line 6, in <module>
    from .autograd._functions import (
  File "/home/alex/stable-diffusion-webui/venv/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py", line 5, in <module>
    import bitsandbytes.functional as F
  File "/home/alex/stable-diffusion-webui/venv/lib/python3.10/site-packages/bitsandbytes/functional.py", line 13, in <module>
    from .cextension import COMPILED_WITH_CUDA, lib
  File "/home/alex/stable-diffusion-webui/venv/lib/python3.10/site-packages/bitsandbytes/cextension.py", line 41, in <module>
    lib = CUDALibrary_Singleton.get_instance().lib
  File "/home/alex/stable-diffusion-webui/venv/lib/python3.10/site-packages/bitsandbytes/cextension.py", line 37, in get_instance
    cls._instance.initialize()
  File "/home/alex/stable-diffusion-webui/venv/lib/python3.10/site-packages/bitsandbytes/cextension.py", line 31, in initialize
    self.lib = ct.cdll.LoadLibrary(binary_path)
  File "/usr/lib/python3.10/ctypes/__init__.py", line 452, in LoadLibrary
    return self._dlltype(name)
  File "/usr/lib/python3.10/ctypes/__init__.py", line 374, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: libcurand.so.10: cannot open shared object file: No such file or directory
 Scheduler, EMA Loaded. 
 Allocated: 3.8GB 
 Reserved: 3.9GB 

***** Running training *****
  Num examples = 5
  Num batches each epoch = 5
  Num Epochs = 20
  Instantaneous batch size per device = 1
  Total train batch size (w. parallel, distributed & accumulation) = 1
  Gradient Accumulation steps = 1
  Total optimization steps = 100
  Actual steps: 100
   Training settings: CPU: False Adam: False, Prec: fp16, Grad: True, TextTr: True EM: False, LR: 1e-05 LORA:True 
 Allocated: 3.8GB 
 Reserved: 3.9GB 

Steps:   0%|                                                                                 | 0/100 [00:01<?, ?it/s, loss=0.258, lr=2e-5, vram=3.9/4.9GB] Step 0 completed. 
 Allocated: 3.9GB 
 Reserved: 4.9GB 

Steps:   5%|███▋                                                                     | 5/100 [00:04<01:06,  1.44it/s, loss=0.112, lr=2e-5, vram=3.9/4.9GB] Step 5 completed. 
 Allocated: 3.9GB 
 Reserved: 4.9GB 

Generating samples: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:04<00:00,  4.94s/it]
Steps: 100%|███████████████████████████████████████████████████████████████████████| 100/100 [01:13<00:00,  1.60it/s, loss=0.139, lr=2e-5, vram=3.9/4.9GB]
Saving lora weights at step 5520
 Allocated 3.9/4.6GB 
 Reserved: 4.0/4.9GB 

Compiling checkpoint for Dreambooth_test_01_dog...
Applying lora weights to unet...
Saving lora unet...
Applying lora weights to text encoder...
Saving lora text encoder...
Saving checkpoint to /home/alex/stable-diffusion-webui/models/Stable-diffusion/Dreambooth_test_01_dog_5520_lora.ckpt...
 CLEANUP:  
 Allocated: 3.9GB 
 Reserved: 4.0GB 

 Cleanup completed. 
 Allocated: 3.9GB 
 Reserved: 4.0GB 

 Cleanup Complete. 
 Allocated: 3.9GB 
 Reserved: 4.0GB 

Steps: 100%|███████████████████████████████████████████████████████████████████████| 100/100 [01:34<00:00,  1.06it/s, loss=0.139, lr=2e-5, vram=3.9/4.9GB]
 Training completed, reloading SD Model. 
 Allocated: 0.0GB 
 Reserved: 3.8GB 

Memory output: {}
 Restored system models. 
 Allocated: 2.0GB 
 Reserved: 3.8GB 

Returning result: Training finished. Total lifetime steps: 5520

Environment What OS? Ubuntu 22.04 Linux Venus 5.15.0-56-generic #62-Ubuntu SMP Tue Nov 22 19:54:14 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux If Windows - WSL or native? No What GPU are you using? RTX 2080 8GB Driver Version: 525.60.13 CUDA Version: 12.0 Screenshots/Config If the issue is specific to an error while training, please provide a screenshot of training parameters or the db_config.json file from /models/dreambooth/MODELNAME/db_config.json db_config.json.log

Thomas-MMJ commented 1 year ago

You need to recompile/reinstall bitsnbytes (it was probably compiled for a different CUDA and/or your cuda path variables aren't set so it doesn't know where to look).

a-l-e-x-d-s-9 commented 1 year ago

@Thomas-MMJ Thanks for the suggestion. I activated venv, uninstalled: "pip3 uninstall bitsandbytes" it was version 0.35.0, then "pip3 install bitsandbytes" version 0.35.4. It did not solve the problem. Also I opened a bug regarding other problem I mentioned: #535. I'm not sure how exactly python virtual environment for webui is coexisting with my system, regarding CUDA versions. From what I see my system using CUDA 12, but virtual environment is using libraries for CUDA 11 or 10. Should webui venv use CUDA version 12, or is it limited to 10/11 version due to pytorch or other essential component limitation? Can you explain how can I fix "cuda path variables"? Should I upgrade other python packages related to CUDA, or it may break my whole venv for webui?

Thomas-MMJ commented 1 year ago

you might have to go to the bitsandbytes and read their directions. But yes you might have to downgrade from CUDA 12 to CUDA 11.8 till they do an update (11.x are backwards compatible, and 12 is too, but most repositories aren't upgraded to check for CUDA 12.0 yet)

https://github.com/TimDettmers/bitsandbytes

https://github.com/TimDettmers/bitsandbytes/blob/main/compile_from_source.md

a-l-e-x-d-s-9 commented 1 year ago

I installed the package: cuda-11.8. I still need to use: "export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/resolve/libs/". Now there is no "Exception importing 8bit adam: libcurand.so.10" exception anymore. @Thomas-MMJ Thanks for helping.