bitsandbytes-foundation / bitsandbytes

Accessible large language models via k-bit quantization for PyTorch.
https://huggingface.co/docs/bitsandbytes/main/en/index
MIT License
6.3k stars 630 forks source link

CUDA SETUP: CUDA detection failed! #931

Open choiszt opened 11 months ago

choiszt commented 11 months ago

Hi! I am encountering an issue while trying to install the bitsandbytes package on my Ubuntu device. I have attempted to install it both through pip (pip install bitsandbytes) and by building it from source with CUDA 11.7 using the following command:

CUDA_VERSION=117 make cuda11x
python setup.py develop

However, I consistently encounter the following error: CUDA SETUP: Setup Failed! System Details: Linux version: CentOS Linux 7 CUDA version installed: 11.7 CUDA-related libraries detected: libcudart.so, libcudart.so.11.0 both in (/usr/local/cuda-11.7/lib64 and /usr/local/cuda/lib64)

Additionally, I have set up my environment variables as follows in my ~/.bashrc file:

export LANGUAGE="en_US.UTF-8" export LANG=en_US:zh_CN.UTF-8 export LC_ALL=C export PATH=/usr/local/cuda-11.7/bin:$PATH export LD_LIBRARY_PATH=/usr/local/cuda-11.7/lib64:$LD_LIBRARY_PATH export BNB_CUDA_VERSION=117

I would appreciate any guidance or assistance in resolving this installation problem with CUDA and the bitsandbytes package. Thank you!

agokrani commented 11 months ago

I am having similar issue on google collab with CUDA version 122. This seems to be a recent issue, it was working until few days ago.

soccerbob97 commented 11 months ago

I also got this issue as well with google colab with cuda version 12.2

bcallonnec commented 11 months ago

In colab i had the same issue,

command !nvcc --version outputs -> cuda version 12.2 But command torch.version.cuda outputs -> 11.7

So i followed the recommandations of this post to install cuda 11-7

agokrani commented 11 months ago

Hi @bcallonnec, For me torch.version.cuda outputs -> 12.1 Not really sure if installing cuda-11.7 will solve the issue but will give it a try

bcallonnec commented 11 months ago

If torch.version.cuda outputs -> 12.1 Then try to install cuda toolkit for cuda 12.1 and then run python -m bitsandbytes

DLYLL commented 11 months ago

1) check torch cuda version, e.g., print torch.version.cuda: 11.8 2) then, following https://github.com/TimDettmers/bitsandbytes/blob/main/compile_from_source.md with: 2.1) wget https://raw.githubusercontent.com/TimDettmers/bitsandbytes/main/install_cuda.sh 2.2) bash install_cuda.sh 118 /usr/local 1 3) compile cuda if needed: CUDA_HOME=/usr/local/cuda-11.8 CUDA_VERSION=118 make cuda11x

finally, test nvcc with "nvcc --version" if secussful, then "python -m bitsandbytes"

it works for me!

rmib200 commented 11 months ago

I am having this issue when I run the install_cuda file, the error says ERROR: toolkitpath: path must be absolute. But my path is absolute, as far as i know, this is my path (running in my local computer)> C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA

Please help, this issue is damaging me

ninjacode01 commented 10 months ago

@DLYLL
make cuda 11x gives this:

make cuda11x CUDA_VERSION=118

ENVIRONMENT
============================
CUDA_VERSION: 118
============================
NVCC path: /home/scai/phd/aiz238140/.conda/envs/llava2/bin/nvcc
GPP path: /usr/bin/g++ VERSION: g++ (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44)
CUDA_HOME: /home/scai/phd/aiz238140/.conda/envs/llava2
CONDA_PREFIX: /home/scai/phd/aiz238140/.conda/envs/llava2
PATH: /home/scai/phd/aiz238140/.conda/envs/llava2/bin:/home/apps/anaconda3_2018/4.6.9/bin:/home/apps/anaconda3_2018/4.6.9/condabin:/usr/lib64/qt-3.3/bin:/usr/share/Modules/4.4.1/bin:/usr/local/cuda-11.8/bin:/opt/pbs/default/bin:/opt/pbs/default/sbin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/pbs/2022.1.3/bin:/home/scai/phd/aiz238140/bin
LD_LIBRARY_PATH: /usr/local/cuda-11.8/lib64{LD_LIBRARY_PATH:+:}:/home/soft/centOS/lib/gnu/tcl/8.4.20/lib

it returns with a g++ error or an nvcc error. g++ version is 7.3.1 and versions are actually fine. I guess there's some problem with the path.

Note that, I am using a shared cluster, so I do not have any sudo permissions.