bitsandbytes-foundation / bitsandbytes

Accessible large language models via k-bit quantization for PyTorch.
https://huggingface.co/docs/bitsandbytes/main/en/index
MIT License
6.14k stars 616 forks source link

libbitsandbytes_cpu.so,libbitsandbytes_cuda124_nocublaslt124.so #1312

Closed magicwang1111 closed 1 month ago

magicwang1111 commented 2 months ago

System Info

linux cuda 12.4 (flux) [wangxi@v100-4 bitsandbytes]$ nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2024 NVIDIA Corporation Built on Tue_Feb_27_16:19:38_PST_2024 Cuda compilation tools, release 12.4, V12.4.99 Build cuda_12.4.r12.4/compiler.33961263_0 (flux) [wangxi@v100-4 bitsandbytes]$

export PATH=/home/wangxi/temp/gcc_11.3.0/bin:$PATH export LD_LIBRARY_PATH=/home/wangxi/temp/gcc_11.3.0/lib64:$LD_LIBRARY_PATH export CUDA_HOME=/mnt/data/wangxi/cuda-12.4/ export PATH=/mnt/data/wangxi/cuda-12.4/bin:$PATH export LD_LIBRARY_PATH=$CUDA_HOME/lib64:$LD_LIBRARY_PATH

export LD_LIBRARY_PATH=/mnt/data/wangxi/cuda-11.8/lib64:$LD_LIBRARY_PATH

export HF_ENDPOINT=https://hf-mirror.com export CC=gcc export CXX=g++ export AM_I_DOCKER=False export BUILD_WITH_CUDA=True export GUROBI_HOME="/mnt/data/shared_data/gurobi/gurobi1102/linux64" export PATH="${PATH}:${GUROBI_HOME}/bin" export LD_LIBRARY_PATH="${LD_LIBRARY_PATH}:${GUROBI_HOME}/lib" export GRB_LICENSE_FILE="/mnt/data/shared_data/gurobi.lic" export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/mnt/data/wangxi/cuda-12.4/targets/x86_64-linux/lib export BNB_CUDA_VERSION=124 export PATH="/mnt/data/wangxi/cmake/cmake/bin:$PATH"

Reproduction

(flux) [wangxi@v100-4 bitsandbytes]$ python -c "import bitsandbytes as bnb; print(bnb.version)" WARNING: BNB_CUDA_VERSION=124 environment variable detected; loading libbitsandbytes_cuda124_nocublaslt124.so. This can be used to load a bitsandbytes version that is different from the PyTorch CUDA version. If this was unintended set the BNB_CUDA_VERSION variable to an empty string: export BNB_CUDA_VERSION= If you use the manual override make sure the right libcudart.so is in your LD_LIBRARY_PATH For example by adding the following to your .bashrc: export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:<path_to_cuda_dir/lib64

Could not find the bitsandbytes CUDA binary at PosixPath('/home/wangxi/bitsandbytes/bitsandbytes/libbitsandbytes_cuda124_nocublaslt124.so') Could not load bitsandbytes native library: /home/wangxi/bitsandbytes/bitsandbytes/libbitsandbytes_cpu.so: cannot open shared object file: No such file or directory Traceback (most recent call last): File "/home/wangxi/bitsandbytes/bitsandbytes/cextension.py", line 109, in lib = get_native_library() File "/home/wangxi/bitsandbytes/bitsandbytes/cextension.py", line 96, in get_native_library dll = ct.cdll.LoadLibrary(str(binary_path)) File "/home/wangxi/miniconda3/envs/flux/lib/python3.10/ctypes/init.py", line 452, in LoadLibrary return self._dlltype(name) File "/home/wangxi/miniconda3/envs/flux/lib/python3.10/ctypes/init.py", line 374, in init self._handle = _dlopen(self._name, mode) OSError: /home/wangxi/bitsandbytes/bitsandbytes/libbitsandbytes_cpu.so: cannot open shared object file: No such file or directory

CUDA Setup failed despite CUDA being available. Please run the following command to get more information:

python -m bitsandbytes

Inspect the output of the command and see if you can locate CUDA libraries. You might need to add them to your LD_LIBRARY_PATH. If you suspect a bug, please take the information from python -m bitsandbytes and open an issue at: https://github.com/TimDettmers/bitsandbytes/issues

0.43.2 (flux) [wangxi@v100-4 bitsandbytes]$

Expected behavior

In the case of a manual override, make sure you set LD_LIBRARY_PATH, e.g. export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-11.2,

For source installations, compile the binaries with cmake -DCOMPUTE_BACKEND=cuda -S .. See the documentation for more details if needed.

Trying a simple check anyway, but this will likely fail... Traceback (most recent call last): File "/home/wangxi/bitsandbytes/bitsandbytes/diagnostics/main.py", line 66, in main sanity_check() File "/home/wangxi/bitsandbytes/bitsandbytes/diagnostics/main.py", line 40, in sanity_check adam.step() File "/home/wangxi/miniconda3/envs/flux/lib/python3.10/site-packages/torch/optim/optimizer.py", line 484, in wrapper out = func(*args, kwargs) File "/home/wangxi/miniconda3/envs/flux/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context return func(*args, *kwargs) File "/home/wangxi/bitsandbytes/bitsandbytes/optim/optimizer.py", line 287, in step self.update_step(group, p, gindex, pindex) File "/home/wangxi/miniconda3/envs/flux/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context return func(args, kwargs) File "/home/wangxi/bitsandbytes/bitsandbytes/optim/optimizer.py", line 500, in update_step F.optimizer_update_32bit( File "/home/wangxi/bitsandbytes/bitsandbytes/functional.py", line 1588, in optimizer_update_32bit optim_func = str2optimizer32bit[optimizer_name][0] NameError: name 'str2optimizer32bit' is not defined Above we output some debug information. Please provide this info when creating an issue via https://github.com/TimDettmers/bitsandbytes/issues/new/choose WARNING: Please be sure to sanitize sensitive info from the output before posting it. (flux) [wangxi@v100-4 bitsandbytes]$

magicwang1111 commented 2 months ago

image

Titus-von-Koeller commented 2 months ago

Hey @magicwang1111!

Could you please give me a bit of context? Are you trying to develop on bitsandbytes?

Because if you aren't based on the info you provided, you wouldn't really need to compile from source, but could just pip install bitsandbytes, which supports CUDA 12.4.

In case you really want to or have to (please outline why you think that), did you refer to our docs for compilation from source?

Either way, please give me a bit more context. I see you compiled from source already, otherwise the libbitsandbytes_cuda124.so wouldn't show up. CPU binaries can be compiled with cmake -DCOMPUTE_BACKEND=cpu . && make, it would also be worth a try to see if that changes your output of python -m bitsandbytes.

matthewdouglas commented 2 months ago

@magicwang1111 It looks like your GPU is a V100? In this case, since there is no int8 tensor core support, you would want to compile with an additional flag: -DNO_CUBLASLT=1.

We can see from the log that it is trying to locate libbitsandbytes_cuda124_nocublaslt124.so. As @Titus-von-Koeller mentioned, we do build this in the wheels on PyPI, but the names are: libbitsandbytes_cuda124.so or libbitsandbytes_cuda124_nocublaslt.so.

There may be a bug here with the extra version number suffix on the nocublaslt filename, and if so, a workaround would be renaming the built file libbitsandbytes_cuda124_nocublaslt.so => libbitsandbytes_cuda124_nocublaslt124.so for now.

magicwang1111 commented 2 months ago

Hey @magicwang1111!

Could you please give me a bit of context? Are you trying to develop on bitsandbytes?

Because if you aren't based on the info you provided, you wouldn't really need to compile from source, but could just pip install bitsandbytes, which supports CUDA 12.4.

In case you really want to or have to (please outline why you think that), did you refer to our docs for compilation from source?

Either way, please give me a bit more context. I see you compiled from source already, otherwise the libbitsandbytes_cuda124.so wouldn't show up. CPU binaries can be compiled with cmake -DCOMPUTE_BACKEND=cpu . && make, it would also be worth a try to see if that changes your output of python -m bitsandbytes.

Yes, I tried compiling bitsandbytes myself because the standard installation through pip didn’t work.

(flux) [wangxi@v100-4 bitsandbytes]$ pip install . Processing /home/wangxi/bitsandbytes Installing build dependencies ... done Getting requirements to build wheel ... done Preparing metadata (pyproject.toml) ... done Requirement already satisfied: torch in /mnt/data/wangxi/miniconda3/envs/flux/lib/python3.10/site-packages (from bitsandbytes==0.43.3) (2.4.0+cu124) Requirement already satisfied: numpy in /mnt/data/wangxi/miniconda3/envs/flux/lib/python3.10/site-packages (from bitsandbytes==0.43.3) (1.26.0) Requirement already satisfied: filelock in /mnt/data/wangxi/miniconda3/envs/flux/lib/python3.10/site-packages (from torch->bitsandbytes==0.43.3) (3.13.3) Requirement already satisfied: typing-extensions>=4.8.0 in /mnt/data/wangxi/miniconda3/envs/flux/lib/python3.10/site-packages (from torch->bitsandbytes==0.43.3) (4.11.0) Requirement already satisfied: sympy in /mnt/data/wangxi/miniconda3/envs/flux/lib/python3.10/site-packages (from torch->bitsandbytes==0.43.3) (1.12) Requirement already satisfied: networkx in /mnt/data/wangxi/miniconda3/envs/flux/lib/python3.10/site-packages (from torch->bitsandbytes==0.43.3) (3.2.1) Requirement already satisfied: jinja2 in /mnt/data/wangxi/miniconda3/envs/flux/lib/python3.10/site-packages (from torch->bitsandbytes==0.43.3) (3.1.3) Requirement already satisfied: fsspec in /mnt/data/wangxi/miniconda3/envs/flux/lib/python3.10/site-packages (from torch->bitsandbytes==0.43.3) (2024.3.1) Requirement already satisfied: nvidia-cuda-nvrtc-cu12==12.4.99 in /mnt/data/wangxi/miniconda3/envs/flux/lib/python3.10/site-packages (from torch->bitsandbytes==0.43.3) (12.4.99) Requirement already satisfied: nvidia-cuda-runtime-cu12==12.4.99 in /mnt/data/wangxi/miniconda3/envs/flux/lib/python3.10/site-packages (from torch->bitsandbytes==0.43.3) (12.4.99) Requirement already satisfied: nvidia-cuda-cupti-cu12==12.4.99 in /mnt/data/wangxi/miniconda3/envs/flux/lib/python3.10/site-packages (from torch->bitsandbytes==0.43.3) (12.4.99) Requirement already satisfied: nvidia-cudnn-cu12==9.1.0.70 in /mnt/data/wangxi/miniconda3/envs/flux/lib/python3.10/site-packages (from torch->bitsandbytes==0.43.3) (9.1.0.70) Requirement already satisfied: nvidia-cublas-cu12==12.4.2.65 in /mnt/data/wangxi/miniconda3/envs/flux/lib/python3.10/site-packages (from torch->bitsandbytes==0.43.3) (12.4.2.65) Requirement already satisfied: nvidia-cufft-cu12==11.2.0.44 in /mnt/data/wangxi/miniconda3/envs/flux/lib/python3.10/site-packages (from torch->bitsandbytes==0.43.3) (11.2.0.44) Requirement already satisfied: nvidia-curand-cu12==10.3.5.119 in /mnt/data/wangxi/miniconda3/envs/flux/lib/python3.10/site-packages (from torch->bitsandbytes==0.43.3) (10.3.5.119) Requirement already satisfied: nvidia-cusolver-cu12==11.6.0.99 in /mnt/data/wangxi/miniconda3/envs/flux/lib/python3.10/site-packages (from torch->bitsandbytes==0.43.3) (11.6.0.99) Requirement already satisfied: nvidia-cusparse-cu12==12.3.0.142 in /mnt/data/wangxi/miniconda3/envs/flux/lib/python3.10/site-packages (from torch->bitsandbytes==0.43.3) (12.3.0.142) Requirement already satisfied: nvidia-nccl-cu12==2.20.5 in /mnt/data/wangxi/miniconda3/envs/flux/lib/python3.10/site-packages (from torch->bitsandbytes==0.43.3) (2.20.5) Requirement already satisfied: nvidia-nvtx-cu12==12.4.99 in /mnt/data/wangxi/miniconda3/envs/flux/lib/python3.10/site-packages (from torch->bitsandbytes==0.43.3) (12.4.99) Requirement already satisfied: nvidia-nvjitlink-cu12==12.4.99 in /mnt/data/wangxi/miniconda3/envs/flux/lib/python3.10/site-packages (from torch->bitsandbytes==0.43.3) (12.4.99) Requirement already satisfied: triton==3.0.0 in /mnt/data/wangxi/miniconda3/envs/flux/lib/python3.10/site-packages (from torch->bitsandbytes==0.43.3) (3.0.0) Requirement already satisfied: MarkupSafe>=2.0 in /mnt/data/wangxi/miniconda3/envs/flux/lib/python3.10/site-packages (from jinja2->torch->bitsandbytes==0.43.3) (2.1.5) Requirement already satisfied: mpmath>=0.19 in /mnt/data/wangxi/miniconda3/envs/flux/lib/python3.10/site-packages (from sympy->torch->bitsandbytes==0.43.3) (1.3.0) Building wheels for collected packages: bitsandbytes Building wheel for bitsandbytes (pyproject.toml) ... done Created wheel for bitsandbytes: filename=bitsandbytes-0.43.3-cp310-cp310-linux_x86_64.whl size=118802 sha256=a013428915190b6301730dfbe332f4c06d081daf4f88a194fd80ac02fe9c448d Stored in directory: /tmp/pip-ephem-wheel-cache-irf3e0sn/wheels/3d/80/71/5b85c0feef4f23988820aa9781527a9add4ab40be80ba036e4 Successfully built bitsandbytes Installing collected packages: bitsandbytes Successfully installed bitsandbytes-0.43.3 (flux) [wangxi@v100-4 bitsandbytes]$python -m bitsandbytes

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ ++++++++++++++++++ BUG REPORT INFORMATION ++++++++++++++++++ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ ++++++++++++++++++++++++++ OTHER +++++++++++++++++++++++++++ CUDA specs: CUDASpecs(highest_compute_capability=(7, 0), cuda_version_string='124', cuda_version_tuple=(12, 4)) PyTorch settings found: CUDA_VERSION=124, Highest Compute Capability: (7, 0). WARNING: BNB_CUDA_VERSION=124 environment variable detected; loading libbitsandbytes_cuda124_nocublaslt124.so. This can be used to load a bitsandbytes version that is different from the PyTorch CUDA version. If this was unintended set the BNB_CUDA_VERSION variable to an empty string: export BNB_CUDA_VERSION= If you use the manual override make sure the right libcudart.so is in your LD_LIBRARY_PATH For example by adding the following to your .bashrc: export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:<path_to_cuda_dir/lib64

Library not found: /home/wangxi/bitsandbytes/bitsandbytes/libbitsandbytes_cuda124_nocublaslt124.so. Maybe you need to compile it from source? If you compiled from source, try again with make CUDA_VERSION=DETECTED_CUDA_VERSION, for example, make CUDA_VERSION=113.

The CUDA version for the compile might depend on your conda install, if using conda. Inspect CUDA version via conda list | grep cuda. To manually override the PyTorch CUDA version please see: https://github.com/TimDettmers/bitsandbytes/blob/main/docs/source/nonpytorchcuda.mdx WARNING: Compute capability < 7.5 detected! Only slow 8-bit matmul is supported for your GPU! If you run into issues with 8-bit matmul, you can try 4-bit quantization: https://huggingface.co/blog/4bit-transformers-bitsandbytes The directory listed in your path is found to be non-existent: //hf-mirror.com Found duplicate CUDA runtime files (see below).

We select the PyTorch default CUDA runtime, which is 12.4, but this might mismatch with the CUDA version that is needed for bitsandbytes. To override this behavior set the BNB_CUDA_VERSION=<version string, e.g. 122> environmental variable.

For example, if you want to use the CUDA version 122, BNB_CUDA_VERSION=122 python ...

OR set the environmental variable in your .bashrc: export BNB_CUDA_VERSION=122

In the case of a manual override, make sure you set LD_LIBRARY_PATH, e.g. export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-11.2,

For source installations, compile the binaries with cmake -DCOMPUTE_BACKEND=cuda -S .. See the documentation for more details if needed.

Trying a simple check anyway, but this will likely fail... Traceback (most recent call last): File "/home/wangxi/bitsandbytes/bitsandbytes/diagnostics/main.py", line 66, in main sanity_check() File "/home/wangxi/bitsandbytes/bitsandbytes/diagnostics/main.py", line 40, in sanity_check adam.step() File "/home/wangxi/miniconda3/envs/flux/lib/python3.10/site-packages/torch/optim/optimizer.py", line 484, in wrapper out = func(*args, kwargs) File "/home/wangxi/miniconda3/envs/flux/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context return func(*args, *kwargs) File "/home/wangxi/bitsandbytes/bitsandbytes/optim/optimizer.py", line 287, in step self.update_step(group, p, gindex, pindex) File "/home/wangxi/miniconda3/envs/flux/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context return func(args, kwargs) File "/home/wangxi/bitsandbytes/bitsandbytes/optim/optimizer.py", line 500, in update_step F.optimizer_update_32bit( File "/home/wangxi/bitsandbytes/bitsandbytes/functional.py", line 1588, in optimizer_update_32bit optim_func = str2optimizer32bit[optimizer_name][0] NameError: name 'str2optimizer32bit' is not defined Above we output some debug information. Please provide this info when creating an issue via https://github.com/TimDettmers/bitsandbytes/issues/new/choose WARNING: Please be sure to sanitize sensitive info from the output before posting it. (flux) [wangxi@v100-4 bitsandbytes]$ python -c "import bitsandbytes as bnb; print(bnb.version)" WARNING: BNB_CUDA_VERSION=124 environment variable detected; loading libbitsandbytes_cuda124_nocublaslt124.so. This can be used to load a bitsandbytes version that is different from the PyTorch CUDA version. If this was unintended set the BNB_CUDA_VERSION variable to an empty string: export BNB_CUDA_VERSION= If you use the manual override make sure the right libcudart.so is in your LD_LIBRARY_PATH For example by adding the following to your .bashrc: export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:<path_to_cuda_dir/lib64

Could not find the bitsandbytes CUDA binary at PosixPath('/home/wangxi/bitsandbytes/bitsandbytes/libbitsandbytes_cuda124_nocublaslt124.so') Could not load bitsandbytes native library: /home/wangxi/bitsandbytes/bitsandbytes/libbitsandbytes_cpu.so: cannot open shared object file: No such file or directory Traceback (most recent call last): File "/home/wangxi/bitsandbytes/bitsandbytes/cextension.py", line 109, in lib = get_native_library() File "/home/wangxi/bitsandbytes/bitsandbytes/cextension.py", line 96, in get_native_library dll = ct.cdll.LoadLibrary(str(binary_path)) File "/home/wangxi/miniconda3/envs/flux/lib/python3.10/ctypes/init.py", line 452, in LoadLibrary return self._dlltype(name) File "/home/wangxi/miniconda3/envs/flux/lib/python3.10/ctypes/init.py", line 374, in init self._handle = _dlopen(self._name, mode) OSError: /home/wangxi/bitsandbytes/bitsandbytes/libbitsandbytes_cpu.so: cannot open shared object file: No such file or directory

CUDA Setup failed despite CUDA being available. Please run the following command to get more information:

python -m bitsandbytes

Inspect the output of the command and see if you can locate CUDA libraries. You might need to add them to your LD_LIBRARY_PATH. If you suspect a bug, please take the information from python -m bitsandbytes and open an issue at: https://github.com/TimDettmers/bitsandbytes/issues

0.43.3 (flux) [wangxi@v100-4 bitsandbytes]$