Runtime error: CUDA Setup failed despite GPU being available (bitsandbytes)

Please check that this issue hasn't been reported before.

[X] I searched previous Bug Reports didn't find any similar reports.

Expected Behavior

Hi, I'm trying the public cloud example that trains mistral on AWS expecting a training run to spin up and complete. Instead, I get the following CUDA error. I've modified the config to use a single spot V100. In my testing, I've tried the latest image versions and winglian/axolotl and winglian/axolotl-cloud image sources, which didn't help.

Current behaviour

(axolotl, pid=29373) ================================================================================
(axolotl, pid=29373) WARNING: Manual override via BNB_CUDA_VERSION env variable detected!
(axolotl, pid=29373) BNB_CUDA_VERSION=XXX can be used to load a bitsandbytes version that is different from the PyTorch CUDA version.
(axolotl, pid=29373) If this was unintended set the BNB_CUDA_VERSION variable to an empty string: export BNB_CUDA_VERSION=
(axolotl, pid=29373) If you use the manual override make sure the right libcudart.so is in your LD_LIBRARY_PATH
(axolotl, pid=29373) For example by adding the following to your .bashrc: export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:<path_to_cuda_dir/lib64
(axolotl, pid=29373) Loading CUDA version: BNB_CUDA_VERSION=118
(axolotl, pid=29373) ================================================================================
(axolotl, pid=29373) 
(axolotl, pid=29373) 
(axolotl, pid=29373)   warn((f'\n\n{"="*80}\n'
(axolotl, pid=29373) /root/miniconda3/envs/py3.10/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:167: UserWarning: Welcome to bitsandbytes. For bug reports, please run
(axolotl, pid=29373) 
(axolotl, pid=29373) python -m bitsandbytes
(axolotl, pid=29373) 
(axolotl, pid=29373) 
(axolotl, pid=29373)   warn(msg)
(axolotl, pid=29373) /root/miniconda3/envs/py3.10/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:167: UserWarning: /usr/local/nvidia/lib:/usr/local/nvidia/lib64 did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
(axolotl, pid=29373)   warn(msg)
(axolotl, pid=29373) /root/miniconda3/envs/py3.10/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:167: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/usr/local/cuda/lib64/libcudart.so'), PosixPath('/usr/local/cuda/lib64/libcudart.so.11.0')}.. We select the PyTorch default libcudart.so, which is {torch.version.cuda},but this might missmatch with the CUDA version that is needed for bitsandbytes.To override this behavior set the BNB_CUDA_VERSION=<version string, e.g. 122> environmental variableFor example, if you want to use the CUDA version 122BNB_CUDA_VERSION=122 python ...OR set the environmental variable in your .bashrc: export BNB_CUDA_VERSION=122In the case of a manual override, make sure you set the LD_LIBRARY_PATH, e.g.export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-11.2
(axolotl, pid=29373)   warn(msg)
(axolotl, pid=29373) /root/miniconda3/envs/py3.10/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:167: UserWarning: WARNING: Compute capability < 7.5 detected! Only slow 8-bit matmul is supported for your GPU!                     If you run into issues with 8-bit matmul, you can try 4-bit quantization: https://huggingface.co/blog/4bit-transformers-bitsandbytes
(axolotl, pid=29373)   warn(msg)
(axolotl, pid=29373) Traceback (most recent call last):
(axolotl, pid=29373)   File "/root/miniconda3/envs/py3.10/lib/python3.10/runpy.py", line 187, in _run_module_as_main
(axolotl, pid=29373)     mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
(axolotl, pid=29373)   File "/root/miniconda3/envs/py3.10/lib/python3.10/runpy.py", line 110, in _get_module_details
(axolotl, pid=29373)     __import__(pkg_name)
(axolotl, pid=29373)   File "/workspace/axolotl/src/axolotl/cli/__init__.py", line 24, in <module>
(axolotl, pid=29373)     from axolotl.common.cli import TrainerCliArgs, load_model_and_tokenizer
(axolotl, pid=29373)   File "/workspace/axolotl/src/axolotl/common/cli.py", line 12, in <module>
(axolotl, pid=29373)     from axolotl.utils.models import load_model, load_tokenizer
(axolotl, pid=29373)   File "/workspace/axolotl/src/axolotl/utils/models.py", line 8, in <module>
(axolotl, pid=29373)     import bitsandbytes as bnb
(axolotl, pid=29373)   File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/bitsandbytes/__init__.py", line 6, in <module>
(axolotl, pid=29373)     from . import cuda_setup, utils, research
(axolotl, pid=29373)   File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/bitsandbytes/research/__init__.py", line 1, in <module>
(axolotl, pid=29373)     from . import nn
(axolotl, pid=29373)   File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/bitsandbytes/research/nn/__init__.py", line 1, in <module>
(axolotl, pid=29373)     from .modules import LinearFP8Mixed, LinearFP8Global
(axolotl, pid=29373)   File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/bitsandbytes/research/nn/modules.py", line 8, in <module>
(axolotl, pid=29373)     from bitsandbytes.optim import GlobalOptimManager
(axolotl, pid=29373)   File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/bitsandbytes/optim/__init__.py", line 6, in <module>
(axolotl, pid=29373)     from bitsandbytes.cextension import COMPILED_WITH_CUDA
(axolotl, pid=29373)   File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/bitsandbytes/cextension.py", line 20, in <module>
(axolotl, pid=29373)     raise RuntimeError('''
(axolotl, pid=29373) RuntimeError: 
(axolotl, pid=29373)         CUDA Setup failed despite GPU being available. Please run the following command to get more information:
(axolotl, pid=29373) 
(axolotl, pid=29373)         python -m bitsandbytes
(axolotl, pid=29373) 
(axolotl, pid=29373)         Inspect the output of the command and see if you can locate CUDA libraries. You might need to add them
(axolotl, pid=29373)         to your LD_LIBRARY_PATH. If you suspect a bug, please take the information from python -m bitsandbytes
(axolotl, pid=29373)         and open an issue at: https://github.com/TimDettmers/bitsandbytes/issues
(axolotl, pid=29373) False
(axolotl, pid=29373) 
(axolotl, pid=29373) ===================================BUG REPORT===================================
(axolotl, pid=29373) ================================================================================
(axolotl, pid=29373) The following directories listed in your path were found to be non-existent: {PosixPath('/usr/local/nvidia/lib'), PosixPath('/usr/local/nvidia/lib64')}
(axolotl, pid=29373) The following directories listed in your path were found to be non-existent: {PosixPath('/workspace/data/huggingface-cache/datasets')}
(axolotl, pid=29373) CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
(axolotl, pid=29373) DEBUG: Possible options found for libcudart.so: {PosixPath('/usr/local/cuda/lib64/libcudart.so'), PosixPath('/usr/local/cuda/lib64/libcudart.so.11.0')}
(axolotl, pid=29373) CUDA SETUP: PyTorch settings found: CUDA_VERSION=118, Highest Compute Capability: 7.0.
(axolotl, pid=29373) CUDA SETUP: To manually override the PyTorch CUDA version please see:https://github.com/TimDettmers/bitsandbytes/blob/main/how_to_use_nonpytorch_cuda.md
(axolotl, pid=29373) CUDA SETUP: Required library version not found: libbitsandbytes_cuda118_nocublaslt.so. Maybe you need to compile it from source?
(axolotl, pid=29373) CUDA SETUP: Defaulting to libbitsandbytes_cpu.so...
(axolotl, pid=29373) 
(axolotl, pid=29373) ================================================ERROR=====================================
(axolotl, pid=29373) CUDA SETUP: CUDA detection failed! Possible reasons:
(axolotl, pid=29373) 1. You need to manually override the PyTorch CUDA version. Please see: "https://github.com/TimDettmers/bitsandbytes/blob/main/how_to_use_nonpytorch_cuda.md
(axolotl, pid=29373) 2. CUDA driver not installed
(axolotl, pid=29373) 3. CUDA not installed
(axolotl, pid=29373) 4. You have multiple conflicting CUDA libraries
(axolotl, pid=29373) 5. Required library not pre-compiled for this bitsandbytes release!
(axolotl, pid=29373) CUDA SETUP: If you compiled from source, try again with `make CUDA_VERSION=DETECTED_CUDA_VERSION` for example, `make CUDA_VERSION=113`.
(axolotl, pid=29373) CUDA SETUP: The CUDA version for the compile might depend on your conda install. Inspect CUDA version via `conda list | grep cuda`.
(axolotl, pid=29373) ================================================================================
(axolotl, pid=29373) 
(axolotl, pid=29373) CUDA SETUP: Something unexpected happened. Please compile from source:
(axolotl, pid=29373) git clone https://github.com/TimDettmers/bitsandbytes.git
(axolotl, pid=29373) cd bitsandbytes
(axolotl, pid=29373) CUDA_VERSION=118 make cuda11x_nomatmul
(axolotl, pid=29373) python setup.py install
(axolotl, pid=29373) CUDA SETUP: Setup Failed!
(axolotl, pid=29373) Traceback (most recent call last):
(axolotl, pid=29373)   File "/root/miniconda3/envs/py3.10/bin/accelerate", line 8, in <module>
(axolotl, pid=29373)     sys.exit(main())
(axolotl, pid=29373)   File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 47, in main
(axolotl, pid=29373)     args.func(args)
(axolotl, pid=29373)   File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1023, in launch_command
(axolotl, pid=29373)     simple_launcher(args)
(axolotl, pid=29373)   File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/accelerate/commands/launch.py", line 643, in simple_launcher
(axolotl, pid=29373)     raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
(axolotl, pid=29373) subprocess.CalledProcessError: Command '['/root/miniconda3/envs/py3.10/bin/python3', '-m', 'axolotl.cli.train', '/sky_workdir/qlora-checkpoint.yaml']' returned non-zero exit status 1.
ERROR: Job 1 failed with return code list: [1]

Steps to reproduce

Steps:

pip install "skypilot-nightly[gcp,aws,azure,oci,lambda,kubernetes,ibm,scp]" # choose your clouds
sky check
git clone https://github.com/skypilot-org/skypilot.git
cd skypilot/llm/axolotl
HF_TOKEN="" BUCKET="" sky spot launch axolotl-spot.yaml --env HF_TOKEN --env BUCKET

Config yaml

name: axolotl

resources: accelerators: V100:1 cloud: aws # optional use_spot: True

workdir: mistral

file_mounts: /sky-notebook: name: ${BUCKET} mode: MOUNT

setup: | docker pull winglian/axolotl-cloud:main-py3.10-cu118-2.1.2

run: | docker run --gpus all \ -v ~/sky_workdir:/sky_workdir \ -v /root/.cache:/root/.cache \ winglian/axolotl-cloud:main-py3.10-cu118-2.1.2 \ huggingface-cli login --token ${HF_TOKEN}

docker run --gpus all \ -v ~/sky_workdir:/sky_workdir \ -v /root/.cache:/root/.cache \ -v /sky-notebook:/sky-notebook \ winglian/axolotl-cloud:main-py3.10-cu118-2.1.2 \ accelerate launch -m axolotl.cli.train /sky_workdir/lora.yaml

envs: HF_TOKEN: # TODO: Replace with huggingface token BUCKET:

Possible solution

No response

Which Operating Systems are you using?

[X] Linux
[ ] macOS
[ ] Windows

Python Version

3.10

axolotl branch-commit

main

Acknowledgements

[X] My issue title is concise, descriptive, and in title casing.
[X] I have searched the existing issues to make sure this bug has not been reported yet.
[X] I am using the latest version of axolotl.
[X] I have provided enough information for the maintainers to reproduce and diagnose the issue.

axolotl-ai-cloud / axolotl