Open guangzlu opened 1 month ago
Seems like the torch version in the docker is for rocm6.0. Please reinstall 6.1 torch using this command and install bitsandbytes again.
pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm6.1/
System Info
Docker image: rocm6.1_ubuntu22.04_py3.10_pytorch_2.4 Rocm6.1.0 Pytorch2.4 GPU: MI250
Reproduction
Install method: git clone --recurse https://github.com/ROCm/bitsandbytes cd bitsandbytes git checkout rocm_enabled pip install -r requirements-dev.txt cmake -DCOMPUTE_BACKEND=hip -S . #Use -DBNB_ROCM_ARCH="gfx90a;gfx942" to target specific gpu arch make pip install .
python script: import bitsandbytes
Expected behavior
I am following this blog https://rocm.blogs.amd.com/artificial-intelligence/llama2-lora/README.html to do finetune on MI250. But after I installed bitsandbytes from source code and run the python script, it turned out the error:
It told that it cannot find libbitsandbytes_cpu.so: OSError: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so: cannot open shared object file: No such file or directory.
But when I moved into /opt/conda/envs/py_3.10/lib/python3.10/site-packages/bitsandbytes, I found I have libbitsandbytes_hip.so
Is it using the wrong .so file? And how to fix this?