bitsandbytes-foundation / bitsandbytes

Accessible large language models via k-bit quantization for PyTorch.
https://huggingface.co/docs/bitsandbytes/main/en/index
MIT License
6.1k stars 610 forks source link

can not detect cuda #923

Open Dexter-GT-86 opened 9 months ago

Dexter-GT-86 commented 9 months ago

(longqloraenv) dexter@mu00153612L:~/zhy/LongQLoRA/script/evaluate$ python -m bitsandbytes

===================================BUG REPORT=================================== Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues

bin /home/dexter/miniconda3/envs/longqloraenv/lib/python3.8/site-packages/bitsandbytes/libbitsandbytes_cuda123.so False /home/dexter/miniconda3/envs/longqloraenv/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: /home/dexter/miniconda3/envs/longqloraenv did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths... warn(msg) CUDA SETUP: CUDA runtime path found: /usr/local/cuda-12.3/lib64/libcudart.so CUDA SETUP: Highest compute capability among GPUs detected: 8.6 CUDA SETUP: Detected CUDA version 123 CUDA SETUP: Required library version not found: libbitsandbytes_cuda123.so. Maybe you need to compile it from source? CUDA SETUP: Defaulting to libbitsandbytes_cpu.so...

================================================ERROR===================================== CUDA SETUP: CUDA detection failed! Possible reasons:

  1. CUDA driver not installed
  2. CUDA not installed
  3. You have multiple conflicting CUDA libraries
  4. Required library not pre-compiled for this bitsandbytes release! CUDA SETUP: If you compiled from source, try again with make CUDA_VERSION=DETECTED_CUDA_VERSION for example, make CUDA_VERSION=113. CUDA SETUP: The CUDA version for the compile might depend on your conda install. Inspect CUDA version via conda list | grep cuda.

CUDA SETUP: Something unexpected happened. Please compile from source: git clone git@github.com:TimDettmers/bitsandbytes.git cd bitsandbytes CUDA_VERSION=123 python setup.py install CUDA SETUP: Setup Failed! Traceback (most recent call last): File "/home/dexter/miniconda3/envs/longqloraenv/lib/python3.8/runpy.py", line 185, in _run_module_as_main mod_name, mod_spec, code = _get_module_details(mod_name, _Error) File "/home/dexter/miniconda3/envs/longqloraenv/lib/python3.8/runpy.py", line 144, in _get_module_details return _get_module_details(pkg_main_name, error) File "/home/dexter/miniconda3/envs/longqloraenv/lib/python3.8/runpy.py", line 111, in _get_module_details import(pkg_name) File "/home/dexter/miniconda3/envs/longqloraenv/lib/python3.8/site-packages/bitsandbytes/init.py", line 6, in from . import cuda_setup, utils, research File "/home/dexter/miniconda3/envs/longqloraenv/lib/python3.8/site-packages/bitsandbytes/research/init.py", line 1, in from . import nn File "/home/dexter/miniconda3/envs/longqloraenv/lib/python3.8/site-packages/bitsandbytes/research/nn/init.py", line 1, in from .modules import LinearFP8Mixed, LinearFP8Global File "/home/dexter/miniconda3/envs/longqloraenv/lib/python3.8/site-packages/bitsandbytes/research/nn/modules.py", line 8, in from bitsandbytes.optim import GlobalOptimManager File "/home/dexter/miniconda3/envs/longqloraenv/lib/python3.8/site-packages/bitsandbytes/optim/init.py", line 6, in from bitsandbytes.cextension import COMPILED_WITH_CUDA File "/home/dexter/miniconda3/envs/longqloraenv/lib/python3.8/site-packages/bitsandbytes/cextension.py", line 20, in raise RuntimeError(''' RuntimeError: CUDA Setup failed despite GPU being available. Please run the following command to get more information:

    python -m bitsandbytes

    Inspect the output of the command and see if you can locate CUDA libraries. You might need to add them
    to your LD_LIBRARY_PATH. If you suspect a bug, please take the information from python -m bitsandbytes
    and open an issue at: https://github.com/TimDettmers/bitsandbytes/issues
younesbelkada commented 9 months ago

It seems to be an issue with CUDA 12.3 - in my docker images I use cuda=12.1 and it seems to work fine, can you try with CUDA 12.1 ? You can also compile from source

CUDA_VERSION=121 make cuda12x
python setup.py develop

cc @Titus-von-Koeller in case I missed anything

schiffy91 commented 9 months ago

@younesbelkada you're that – everything works when I use nvcr.io/nvidia/pytorch:23.04-py3, but when I use nvcr.io/nvidia/pytorch:23.12-py3, it doesn't (CUDA 12.1.0 vs CUDA 12.3.2). I get the same errors as above.