Bug report: finds multiple cuda versions, then none.

Below is the output from python -m bitsandbytes. Under it I show that libcudart exists in the folder. I'm not sure how to proceed. I've tried using the BNB_CUDA_VERSION env variable set to "", "12", "120", and "121". Torch is using 12.1 so that's what bitsandbytes should ideally use.

EDIT: added diagnostics from Torch as well.

(cudapytorch) alhq@al-ubuntu:/mnt/9E28E54828E5204F/llama/mydirs$ python -m bitsandbytes
/mnt/9E28E54828E5204F/anaconda3/envs/cudapytorch/lib/python3.11/site-packages/bitsandbytes/cuda_setup/main.py:106: UserWarning: 

================================================================================
WARNING: Manual override via BNB_CUDA_VERSION env variable detected!
BNB_CUDA_VERSION=XXX can be used to load a bitsandbytes version that is different from the PyTorch CUDA version.
If this was unintended set the BNB_CUDA_VERSION variable to an empty string: export BNB_CUDA_VERSION=
If you use the manual override make sure the right libcudart.so is in your LD_LIBRARY_PATH
For example by adding the following to your .bashrc: export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:<path_to_cuda_dir/lib64
Loading CUDA version: BNB_CUDA_VERSION=12
================================================================================

  warn((f'\n\n{"="*80}\n'
False

===================================BUG REPORT===================================
/mnt/9E28E54828E5204F/anaconda3/envs/cudapytorch/lib/python3.11/site-packages/bitsandbytes/cuda_setup/main.py:166: UserWarning: Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

  warn(msg)
================================================================================
/mnt/9E28E54828E5204F/anaconda3/envs/cudapytorch/lib/python3.11/site-packages/bitsandbytes/cuda_setup/main.py:166: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/mnt/9E28E54828E5204F/anaconda3/envs/cudapytorch/lib/libcudart.so.11.0'), PosixPath('/mnt/9E28E54828E5204F/anaconda3/envs/cudapytorch/lib/libcudart.so')}.. We select the PyTorch default libcudart.so, which is {torch.version.cuda},but this might missmatch with the CUDA version that is needed for bitsandbytes.To override this behavior set the BNB_CUDA_VERSION=<version string, e.g. 122> environmental variableFor example, if you want to use the CUDA version 122BNB_CUDA_VERSION=122 python ...OR set the environmental variable in your .bashrc: export BNB_CUDA_VERSION=122In the case of a manual override, make sure you set the LD_LIBRARY_PATH, e.g.export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-11.2
  warn(msg)
/mnt/9E28E54828E5204F/anaconda3/envs/cudapytorch/lib/python3.11/site-packages/bitsandbytes/cuda_setup/main.py:166: UserWarning: /mnt/9E28E54828E5204F/anaconda3/envs/cudapytorch did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
  warn(msg)
/mnt/9E28E54828E5204F/anaconda3/envs/cudapytorch/lib/python3.11/site-packages/bitsandbytes/cuda_setup/main.py:166: UserWarning: /mnt/9E28E54828E5204F/anaconda3/envs/cudapytorch/lib/ did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
  warn(msg)
The following directories listed in your path were found to be non-existent: {PosixPath('@/tmp/.ICE-unix/2475,unix/al-ubuntu'), PosixPath('local/al-ubuntu')}
The following directories listed in your path were found to be non-existent: {PosixPath('/etc/xdg/xdg-ubuntu')}
The following directories listed in your path were found to be non-existent: {PosixPath('/etc/xdg/xdg-ubuntu')}
The following directories listed in your path were found to be non-existent: {PosixPath('/home/alhq/.platformio/packages/toolchain-xtensa-esp32/bin/xtensa-esp32-elf-gcc')}
The following directories listed in your path were found to be non-existent: {PosixPath('0'), PosixPath('1')}
The following directories listed in your path were found to be non-existent: {PosixPath('/home/alhq/esp/esp-idf')}
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
The following directories listed in your path were found to be non-existent: {PosixPath('/usr/local/cuda/lib64')}
DEBUG: Possible options found for libcudart.so: set()
CUDA SETUP: PyTorch settings found: CUDA_VERSION=121, Highest Compute Capability: 8.6.
CUDA SETUP: To manually override the PyTorch CUDA version please see:https://github.com/TimDettmers/bitsandbytes/blob/main/how_to_use_nonpytorch_cuda.md
CUDA SETUP: Required library version not found: libbitsandbytes_cuda121.so. Maybe you need to compile it from source?
CUDA SETUP: Defaulting to libbitsandbytes_cpu.so...

================================================ERROR=====================================
CUDA SETUP: CUDA detection failed! Possible reasons:
1. You need to manually override the PyTorch CUDA version. Please see: "https://github.com/TimDettmers/bitsandbytes/blob/main/how_to_use_nonpytorch_cuda.md
2. CUDA driver not installed
3. CUDA not installed
4. You have multiple conflicting CUDA libraries
5. Required library not pre-compiled for this bitsandbytes release!
CUDA SETUP: If you compiled from source, try again with `make CUDA_VERSION=DETECTED_CUDA_VERSION` for example, `make CUDA_VERSION=113`.
CUDA SETUP: The CUDA version for the compile might depend on your conda install. Inspect CUDA version via `conda list | grep cuda`.
================================================================================

CUDA SETUP: Problem: The main issue seems to be that the main CUDA runtime library was not detected.
CUDA SETUP: Solution 1: To solve the issue the libcudart.so location needs to be added to the LD_LIBRARY_PATH variable
CUDA SETUP: Solution 1a): Find the cuda runtime library via: find / -name libcudart.so 2>/dev/null
CUDA SETUP: Solution 1b): Once the library is found add it to the LD_LIBRARY_PATH: export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:FOUND_PATH_FROM_1a
CUDA SETUP: Solution 1c): For a permanent solution add the export from 1b into your .bashrc file, located at ~/.bashrc
CUDA SETUP: Solution 2: If no library was found in step 1a) you need to install CUDA.
CUDA SETUP: Solution 2a): Download CUDA install script: wget https://github.com/TimDettmers/bitsandbytes/blob/main/cuda_install.sh
CUDA SETUP: Solution 2b): Install desired CUDA version to desired location. The syntax is bash cuda_install.sh CUDA_VERSION PATH_TO_INSTALL_INTO.
CUDA SETUP: Solution 2b): For example, "bash cuda_install.sh 113 ~/local/" will download CUDA 11.3 and install into the folder ~/local
CUDA SETUP: Setup Failed!
Traceback (most recent call last):
  File "<frozen runpy>", line 189, in _run_module_as_main
  File "<frozen runpy>", line 148, in _get_module_details
  File "<frozen runpy>", line 112, in _get_module_details
  File "/mnt/9E28E54828E5204F/anaconda3/envs/cudapytorch/lib/python3.11/site-packages/bitsandbytes/__init__.py", line 6, in <module>
    from . import cuda_setup, utils, research
  File "/mnt/9E28E54828E5204F/anaconda3/envs/cudapytorch/lib/python3.11/site-packages/bitsandbytes/research/__init__.py", line 1, in <module>
    from . import nn
  File "/mnt/9E28E54828E5204F/anaconda3/envs/cudapytorch/lib/python3.11/site-packages/bitsandbytes/research/nn/__init__.py", line 1, in <module>
    from .modules import LinearFP8Mixed, LinearFP8Global
  File "/mnt/9E28E54828E5204F/anaconda3/envs/cudapytorch/lib/python3.11/site-packages/bitsandbytes/research/nn/modules.py", line 8, in <module>
    from bitsandbytes.optim import GlobalOptimManager
  File "/mnt/9E28E54828E5204F/anaconda3/envs/cudapytorch/lib/python3.11/site-packages/bitsandbytes/optim/__init__.py", line 6, in <module>
    from bitsandbytes.cextension import COMPILED_WITH_CUDA
  File "/mnt/9E28E54828E5204F/anaconda3/envs/cudapytorch/lib/python3.11/site-packages/bitsandbytes/cextension.py", line 20, in <module>
    raise RuntimeError('''
RuntimeError: 
        CUDA Setup failed despite GPU being available. Please run the following command to get more information:

        python -m bitsandbytes

        Inspect the output of the command and see if you can locate CUDA libraries. You might need to add them
        to your LD_LIBRARY_PATH. If you suspect a bug, please take the information from python -m bitsandbytes
        and open an issue at: https://github.com/TimDettmers/bitsandbytes/issues

(cudapytorch) alhq@al-ubuntu:/mnt/9E28E54828E5204F/llama/mydir$ ls /mnt/9E28E54828E5204F/anaconda3/envs/cudapytorch/lib/ | grep libcudart
libcudart.so
libcudart.so.11.0
libcudart.so.11.7.60
libcudart.so.12
libcudart.so.12.1.105
libcudart_static.a

(cudapytorch)$ echo $BNB_CUDA_VERSION
12
(cudapytorch)$ echo $LD_LIBRARY_PATH
/mnt/9E28E54828E5204F/anaconda3/envs/cudapytorch/lib/

torch.cuda.is_available()=True
torch.cuda.get_device_name()=NVIDIA GeForce RTX 3070
torch.cuda.get_device_properties(torch.cuda.current_device())=_CudaDeviceProperties(name='NVIDIA GeForce RTX 3070', major=8, minor=6, total_memory=7940MB, multi_processor_count=46)
torch.cuda.get_device_capability()=(8, 6)
torch.version.cuda=12.1

I'm having a similar issue. I have Automatic1111 running just fine on my RTX 4090. I have setup a separate venv to use with kohya-ss and when I try to start Training, it starts out fine and then when bitsandbytes kicks in, it fails to setup CUDA.

Here are my system details:

   kohya_ss   git:(master) ✗ neofetch
⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⢰⡆⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄  tokenwizard@OfficePC 
⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⢠⣿⣿⡄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄   
⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⢀⣾⣿⣿⣿⡀⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄    · Archcraft x86_64 
⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⣼⣿⣿⣿⣿⣷⡀⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄    · 6.5.5-zen1-1-zen 
⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⣼⣿⣿⣿⣿⣿⣿⣷⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄    · 4 days, 23 hours, 27 mins 
⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⢼⣿⣿⣿⣿⣿⣿⣿⣿⣧⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄    · 1254 (pacman) 
⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⣰⣤⣈⠻⢿⣿⣿⣿⣿⣿⣿⣧⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄    · zsh 5.9 
⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⣰⣿⣿⣿⣿⣮⣿⣿⣿⣿⣿⣿⣿⣧⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄    · 3440x1440 
⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⣰⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣧⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄    · Openbox 
⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⣰⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣧⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄    · Adapta-Nokto 
⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⣼⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣧⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄    · Arc-Dark [GTK2/3] 
⠄⠄⠄⠄⠄⠄⠄⠄⠄⣼⣿⣿⣿⣿⣿⡿⣿⣿⡟⠄⠄⠸⣿⣿⡿⣿⣿⣿⣿⣿⣷⡀⠄⠄⠄⠄⠄⠄⠄⠄    · Arc-Circle [GTK2/3] 
⠄⠄⠄⠄⠄⠄⠄⠄⣼⣿⣿⣿⣿⣿⡏⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠈⣿⣿⣿⣿⣿⣷⡀⠄⠄⠄⠄⠄⠄⠄    · xfce4-terminal 
⠄⠄⠄⠄⠄⠄⢀⣼⣿⣿⣿⣿⣿⣿⡗⠄⠄⠄⢀⣠⣤⣀⠄⠄⠄⠸⣿⣿⣿⣿⣿⣿⣷⡀⠄⠄⠄⠄⠄⠄    · AMD Ryzen 7 3800X (16) @ 3.900GHz 
⠄⠄⠄⠄⠄⢀⣾⣿⣿⣿⣿⣿⡏⠁⠄⠄⠄⢠⣿⣿⣿⣿⡇⠄⠄⠄⠄⢙⣿⣿⣻⠿⣿⣷⡀⠄⠄⠄⠄⠄    · NVIDIA GeForce RTX 4090 
⠄⠄⠄⠄⢀⣾⣿⣿⣿⣿⣿⣿⣷⣤⡀⠄⠄⠄⠻⣿⣿⡿⠃⠄⠄⠄⢀⣼⣿⣿⣿⣿⣦⣌⠙⠄⠄⠄⠄⠄    · 18517MiB / 64226MiB 
⠄⠄⠄⢠⣾⣿⣿⣿⣿⣿⣿⣿⣿⣿⠏⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⢿⣿⣿⣿⣿⣿⣿⣿⣿⣦⡀⠄⠄⠄
⠄⠄⢠⣿⣿⣿⣿⣿⣿⣿⡿⠟⠋⠁⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠙⠻⣿⣿⣿⣿⣿⣿⣿⣿⡄⠄⠄                          
⠄⣠⣿⣿⣿⣿⠿⠛⠋⠁⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠉⠙⠻⢿⣿⣿⣿⣿⣆⠄                          
⡰⠟⠛⠉⠁⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠉⠙⠛⠿⢆

And here is the full output from kohya-ss from the start of the Training to the error:

13:06:24-525298 INFO     Loading config...                                                                                                                                                                                                                   
13:06:26-206578 INFO     Start training Dreambooth...                                                                                                                                                                                                        
13:06:26-208214 INFO     Valid image folder names found in: /home/tokenwizard/tmp/lmd/lora/img                                                                                                                                                               
13:06:26-209695 INFO     Folder 100_lmd : steps 5000                                                                                                                                                                                                         
13:06:26-210956 INFO     max_train_steps (5000 / 4 / 1 * 1 * 1) = 1250                                                                                                                                                                                       
13:06:26-212455 INFO     stop_text_encoder_training = 0                                                                                                                                                                                                      
13:06:26-213773 INFO     lr_warmup_steps = 0                                                                                                                                                                                                                 
13:06:26-215031 INFO     Saving training config to /home/tokenwizard/tmp/lmd/lora/LMD_RealisticVision51_v1_20231101-130626.json...                                                                                                                           
13:06:26-216684 INFO     accelerate launch --num_cpu_threads_per_process=2 "./train_db.py" --enable_bucket --min_bucket_reso=256 --max_bucket_reso=2048 --pretrained_model_name_or_path="/home/tokenwizard/stable-diffusion-webui/models/Stable-diffusion/1.0
                         - realisticVisionV51_v51VAE.safetensors" --train_data_dir="/home/tokenwizard/tmp/lmd/lora/img" --resolution="512,512" --output_dir="/home/tokenwizard/tmp/lmd/lora" --logging_dir="/home/tokenwizard/tmp/lmd/lora/logs"             
                         --save_model_as=safetensors --output_name="LMD_RealisticVision51_v1" --lr_scheduler_num_cycles="1" --max_data_loader_n_workers="1" --learning_rate="0.0001" --lr_scheduler="constant" --train_batch_size="4"                        
                         --max_train_steps="1250" --save_every_n_epochs="1" --mixed_precision="bf16" --save_precision="bf16" --caption_extension=".txt" --cache_latents --optimizer_type="AdamW8bit" --max_data_loader_n_workers="1" --clip_skip=2           
                         --keep_tokens="1" --bucket_reso_steps=64 --shuffle_caption --xformers --bucket_no_upscale --noise_offset=0.0                                                                                                                        
2023-11-01 13:06:29.500732: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2023-11-01 13:06:29.500772: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2023-11-01 13:06:29.500807: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2023-11-01 13:06:29.507459: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-11-01 13:06:30.150140: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
prepare tokenizer
prepare images.
found directory /home/tokenwizard/tmp/lmd/lora/img/100_lmd contains 50 image files
5000 train images with repeating.
0 reg images.
no regularization images / 正則化画像が見つかりませんでした
[Dataset 0]
  batch_size: 4
  resolution: (512, 512)
  enable_bucket: True
  min_bucket_reso: 256
  max_bucket_reso: 2048
  bucket_reso_steps: 64
  bucket_no_upscale: True

  [Subset 0 of Dataset 0]
    image_dir: "/home/tokenwizard/tmp/lmd/lora/img/100_lmd"
    image_count: 50
    num_repeats: 100
    shuffle_caption: True
    keep_tokens: 1
    caption_dropout_rate: 0.0
    caption_dropout_every_n_epoches: 0
    caption_tag_dropout_rate: 0.0
    caption_prefix: None
    caption_suffix: None
    color_aug: False
    flip_aug: False
    face_crop_aug_range: None
    random_crop: False
    token_warmup_min: 1,
    token_warmup_step: 0,
    is_reg: False
    class_tokens: lmd
    caption_extension: .txt

[Dataset 0]
loading image sizes.
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:00<00:00, 10607.75it/s]
make buckets
min_bucket_reso and max_bucket_reso are ignored if bucket_no_upscale is set, because bucket reso is defined by image size automatically / bucket_no_upscaleが指定された場合は、bucketの解像度は画像サイズから自動計算されるため、min_bucket_resoとmax_bucket_resoは無視されます
number of images (including repeats) / 各bucketの画像枚数（繰り返し回数を含む）
bucket 0: resolution (384, 512), count: 800
bucket 1: resolution (448, 448), count: 600
bucket 2: resolution (448, 512), count: 400
bucket 3: resolution (512, 448), count: 100
bucket 4: resolution (512, 512), count: 3100
mean ar error (without repeats): 0.013517200506358285
prepare accelerator
loading model for process 0/1
load StableDiffusion checkpoint: /home/tokenwizard/stable-diffusion-webui/models/Stable-diffusion/1.0 - realisticVisionV51_v51VAE.safetensors
UNet2DConditionModel: 64, 8, 768, False, False
loading u-net: <All keys matched successfully>
loading vae: <All keys matched successfully>
loading text encoder: <All keys matched successfully>
Enable xformers for U-Net
[Dataset 0]
caching latents.
checking cache validity...
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:00<00:00, 1109604.23it/s]
caching latents...
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:01<00:00, 27.35it/s]
prepare optimizer, data loader etc.
False

===================================BUG REPORT===================================
/home/tokenwizard/kohya_ss/venv/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:166: UserWarning: Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

  warn(msg)
================================================================================
The following directories listed in your path were found to be non-existent: {PosixPath('/home/tokenwizard/kohya_ss/venv/lib/python3.10/site-packages/cv2/../../lib64')}
/home/tokenwizard/kohya_ss/venv/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:166: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/opt/cuda/targets/x86_64-linux/lib/libcudart.so'), PosixPath('/opt/cuda/lib64/libcudart.so')}.. We select the PyTorch default libcudart.so, which is {torch.version.cuda},but this might missmatch with the CUDA version that is needed for bitsandbytes.To override this behavior set the BNB_CUDA_VERSION=<version string, e.g. 122> environmental variableFor example, if you want to use the CUDA version 122BNB_CUDA_VERSION=122 python ...OR set the environmental variable in your .bashrc: export BNB_CUDA_VERSION=122In the case of a manual override, make sure you set the LD_LIBRARY_PATH, e.g.export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-11.2
  warn(msg)
/home/tokenwizard/kohya_ss/venv/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:166: UserWarning: /home/tokenwizard/kohya_ss/venv/lib/python3.10/site-packages/cv2/../../lib64::/opt/cuda/lib64:/opt/cuda/targets/x86_64-linux/lib did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
  warn(msg)
The following directories listed in your path were found to be non-existent: {PosixPath('/org/freedesktop/DisplayManager/Session1')}
The following directories listed in your path were found to be non-existent: {PosixPath('/usr/${LIB}/libgtk3-nocsd.so.0')}
The following directories listed in your path were found to be non-existent: {PosixPath('/org/freedesktop/DisplayManager/Seat0')}
The following directories listed in your path were found to be non-existent: {PosixPath('https'), PosixPath('//debuginfod.archlinux.org ')}
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
The following directories listed in your path were found to be non-existent: {PosixPath('/usr/local/cuda/lib64')}
DEBUG: Possible options found for libcudart.so: set()
CUDA SETUP: PyTorch settings found: CUDA_VERSION=118, Highest Compute Capability: 8.9.
CUDA SETUP: To manually override the PyTorch CUDA version please see:https://github.com/TimDettmers/bitsandbytes/blob/main/how_to_use_nonpytorch_cuda.md
CUDA SETUP: Loading binary /home/tokenwizard/kohya_ss/venv/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda118.so...
libcusparse.so.11: cannot open shared object file: No such file or directory
CUDA SETUP: Problem: The main issue seems to be that the main CUDA runtime library was not detected.
CUDA SETUP: Solution 1: To solve the issue the libcudart.so location needs to be added to the LD_LIBRARY_PATH variable
CUDA SETUP: Solution 1a): Find the cuda runtime library via: find / -name libcudart.so 2>/dev/null
CUDA SETUP: Solution 1b): Once the library is found add it to the LD_LIBRARY_PATH: export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:FOUND_PATH_FROM_1a
CUDA SETUP: Solution 1c): For a permanent solution add the export from 1b into your .bashrc file, located at ~/.bashrc
CUDA SETUP: Solution 2: If no library was found in step 1a) you need to install CUDA.
CUDA SETUP: Solution 2a): Download CUDA install script: wget https://github.com/TimDettmers/bitsandbytes/blob/main/cuda_install.sh
CUDA SETUP: Solution 2b): Install desired CUDA version to desired location. The syntax is bash cuda_install.sh CUDA_VERSION PATH_TO_INSTALL_INTO.
CUDA SETUP: Solution 2b): For example, "bash cuda_install.sh 113 ~/local/" will download CUDA 11.3 and install into the folder ~/local
Traceback (most recent call last):
  File "/home/tokenwizard/kohya_ss/./train_db.py", line 488, in <module>
    train(args)
  File "/home/tokenwizard/kohya_ss/./train_db.py", line 171, in train
    _, _, optimizer = train_util.get_optimizer(args, trainable_params)
  File "/home/tokenwizard/kohya_ss/library/train_util.py", line 3419, in get_optimizer
    import bitsandbytes as bnb
  File "/home/tokenwizard/kohya_ss/venv/lib/python3.10/site-packages/bitsandbytes/__init__.py", line 6, in <module>
    from . import cuda_setup, utils, research
  File "/home/tokenwizard/kohya_ss/venv/lib/python3.10/site-packages/bitsandbytes/research/__init__.py", line 1, in <module>
    from . import nn
  File "/home/tokenwizard/kohya_ss/venv/lib/python3.10/site-packages/bitsandbytes/research/nn/__init__.py", line 1, in <module>
    from .modules import LinearFP8Mixed, LinearFP8Global
  File "/home/tokenwizard/kohya_ss/venv/lib/python3.10/site-packages/bitsandbytes/research/nn/modules.py", line 8, in <module>
    from bitsandbytes.optim import GlobalOptimManager
  File "/home/tokenwizard/kohya_ss/venv/lib/python3.10/site-packages/bitsandbytes/optim/__init__.py", line 6, in <module>
    from bitsandbytes.cextension import COMPILED_WITH_CUDA
  File "/home/tokenwizard/kohya_ss/venv/lib/python3.10/site-packages/bitsandbytes/cextension.py", line 20, in <module>
    raise RuntimeError('''
RuntimeError: 
        CUDA Setup failed despite GPU being available. Please run the following command to get more information:

        python -m bitsandbytes

        Inspect the output of the command and see if you can locate CUDA libraries. You might need to add them
        to your LD_LIBRARY_PATH. If you suspect a bug, please take the information from python -m bitsandbytes
        and open an issue at: https://github.com/TimDettmers/bitsandbytes/issues
Traceback (most recent call last):
  File "/home/tokenwizard/kohya_ss/venv/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/home/tokenwizard/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 47, in main
    args.func(args)
  File "/home/tokenwizard/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 986, in launch_command
    simple_launcher(args)
  File "/home/tokenwizard/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 628, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/home/tokenwizard/kohya_ss/venv/bin/python3.10', './train_db.py', '--enable_bucket', '--min_bucket_reso=256', '--max_bucket_reso=2048', '--pretrained_model_name_or_path=/home/tokenwizard/stable-diffusion-webui/models/Stable-diffusion/1.0 - realisticVisionV51_v51VAE.safetensors', '--train_data_dir=/home/tokenwizard/tmp/lmd/lora/img', '--resolution=512,512', '--output_dir=/home/tokenwizard/tmp/lmd/lora', '--logging_dir=/home/tokenwizard/tmp/lmd/lora/logs', '--save_model_as=safetensors', '--output_name=LMD_RealisticVision51_v1', '--lr_scheduler_num_cycles=1', '--max_data_loader_n_workers=1', '--learning_rate=0.0001', '--lr_scheduler=constant', '--train_batch_size=4', '--max_train_steps=1250', '--save_every_n_epochs=1', '--mixed_precision=bf16', '--save_precision=bf16', '--caption_extension=.txt', '--cache_latents', '--optimizer_type=AdamW8bit', '--max_data_loader_n_workers=1', '--clip_skip=2', '--keep_tokens=1', '--bucket_reso_steps=64', '--shuffle_caption', '--xformers', '--bucket_no_upscale', '--noise_offset=0.0']' returned non-zero exit status 1.

I tried this suggestion from the instructions in the output to set the CUDA version to 118 which matches what we are using for kohya-ss, and also exported my LD_LIBRARY_PATH:

   kohya_ss   git:(master) ✗ echo $LD_LIBRARY_PATH
:/opt/cuda/lib64:/opt/cuda/targets/x86_64-linux/lib
   kohya_ss   git:(master) ✗ echo $BNB_CUDA_VERSION
118

As you can see, the libraries are there in the path:

   kohya_ss   git:(master) ✗ ls /opt/cuda/targets/x86_64-linux/lib | grep libcudart.so
libcudart.so
libcudart.so.12
libcudart.so.12.2.53

So I'm unclear why bitsandbytes is complaining about the libraries not being there:

/home/tokenwizard/kohya_ss/venv/lib/python3.10/site-packages/cv2/../../lib64::/opt/cuda/lib64:/opt/cuda/targets/x86_64-linux/lib did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...

Here is the full Bug Report output form bitsandbytes:

(KohyaSS)    kohya_ss   git:(master) ✗ python -m bitsandbytes 
False

===================================BUG REPORT===================================
/home/tokenwizard/kohya_ss/venv/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:166: UserWarning: Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

  warn(msg)
================================================================================
/home/tokenwizard/kohya_ss/venv/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:166: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/opt/cuda/targets/x86_64-linux/lib/libcudart.so'), PosixPath('/opt/cuda/lib64/libcudart.so')}.. We select the PyTorch default libcudart.so, which is {torch.version.cuda},but this might missmatch with the CUDA version that is needed for bitsandbytes.To override this behavior set the BNB_CUDA_VERSION=<version string, e.g. 122> environmental variableFor example, if you want to use the CUDA version 122BNB_CUDA_VERSION=122 python ...OR set the environmental variable in your .bashrc: export BNB_CUDA_VERSION=122In the case of a manual override, make sure you set the LD_LIBRARY_PATH, e.g.export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-11.2
  warn(msg)
/home/tokenwizard/kohya_ss/venv/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:166: UserWarning: :/opt/cuda/lib64:/opt/cuda/targets/x86_64-linux/lib did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
  warn(msg)
The following directories listed in your path were found to be non-existent: {PosixPath('/org/freedesktop/DisplayManager/Session1')}
The following directories listed in your path were found to be non-existent: {PosixPath('/org/freedesktop/DisplayManager/Seat0')}
The following directories listed in your path were found to be non-existent: {PosixPath('//debuginfod.archlinux.org '), PosixPath('https')}
The following directories listed in your path were found to be non-existent: {PosixPath('/usr/${LIB}/libgtk3-nocsd.so.0')}
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
The following directories listed in your path were found to be non-existent: {PosixPath('/usr/local/cuda/lib64')}
DEBUG: Possible options found for libcudart.so: set()
CUDA SETUP: PyTorch settings found: CUDA_VERSION=118, Highest Compute Capability: 8.9.
CUDA SETUP: To manually override the PyTorch CUDA version please see:https://github.com/TimDettmers/bitsandbytes/blob/main/how_to_use_nonpytorch_cuda.md
CUDA SETUP: Loading binary /home/tokenwizard/kohya_ss/venv/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda118.so...
libcusparse.so.11: cannot open shared object file: No such file or directory
CUDA SETUP: Problem: The main issue seems to be that the main CUDA runtime library was not detected.
CUDA SETUP: Solution 1: To solve the issue the libcudart.so location needs to be added to the LD_LIBRARY_PATH variable
CUDA SETUP: Solution 1a): Find the cuda runtime library via: find / -name libcudart.so 2>/dev/null
CUDA SETUP: Solution 1b): Once the library is found add it to the LD_LIBRARY_PATH: export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:FOUND_PATH_FROM_1a
CUDA SETUP: Solution 1c): For a permanent solution add the export from 1b into your .bashrc file, located at ~/.bashrc
CUDA SETUP: Solution 2: If no library was found in step 1a) you need to install CUDA.
CUDA SETUP: Solution 2a): Download CUDA install script: wget https://github.com/TimDettmers/bitsandbytes/blob/main/cuda_install.sh
CUDA SETUP: Solution 2b): Install desired CUDA version to desired location. The syntax is bash cuda_install.sh CUDA_VERSION PATH_TO_INSTALL_INTO.
CUDA SETUP: Solution 2b): For example, "bash cuda_install.sh 113 ~/local/" will download CUDA 11.3 and install into the folder ~/local
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 187, in _run_module_as_main
    mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
  File "/usr/lib/python3.10/runpy.py", line 146, in _get_module_details
    return _get_module_details(pkg_main_name, error)
  File "/usr/lib/python3.10/runpy.py", line 110, in _get_module_details
    __import__(pkg_name)
  File "/home/tokenwizard/kohya_ss/venv/lib/python3.10/site-packages/bitsandbytes/__init__.py", line 6, in <module>
    from . import cuda_setup, utils, research
  File "/home/tokenwizard/kohya_ss/venv/lib/python3.10/site-packages/bitsandbytes/research/__init__.py", line 1, in <module>
    from . import nn
  File "/home/tokenwizard/kohya_ss/venv/lib/python3.10/site-packages/bitsandbytes/research/nn/__init__.py", line 1, in <module>
    from .modules import LinearFP8Mixed, LinearFP8Global
  File "/home/tokenwizard/kohya_ss/venv/lib/python3.10/site-packages/bitsandbytes/research/nn/modules.py", line 8, in <module>
    from bitsandbytes.optim import GlobalOptimManager
  File "/home/tokenwizard/kohya_ss/venv/lib/python3.10/site-packages/bitsandbytes/optim/__init__.py", line 6, in <module>
    from bitsandbytes.cextension import COMPILED_WITH_CUDA
  File "/home/tokenwizard/kohya_ss/venv/lib/python3.10/site-packages/bitsandbytes/cextension.py", line 20, in <module>
    raise RuntimeError('''
RuntimeError: 
        CUDA Setup failed despite GPU being available. Please run the following command to get more information:

        python -m bitsandbytes

        Inspect the output of the command and see if you can locate CUDA libraries. You might need to add them
        to your LD_LIBRARY_PATH. If you suspect a bug, please take the information from python -m bitsandbytes
        and open an issue at: https://github.com/TimDettmers/bitsandbytes/issues

bitsandbytes-foundation / bitsandbytes

Bug report: finds multiple cuda versions, then none. #829