Open qingqinggu opened 1 month ago
This would happen if you don't have CUDA.
@qingqinggu This can't be the complete debugging output. Please update your post with the complete output and give us the exact version of bitsandbytes you're using.
python -m bitsandbytes
XGPU-lite: L-229:Client configuration: use_uma:1, compute_schedule_mode:4, need_launch_kernel_admission:0, time_slice_spin_or_cv:1, enable_heart_beat:0, enable_monitor:0.
XGPU-lite: L-163:func: cuInit, pid: 22366, tid: 22366, flags: 0
XGPU-lite: L-163:func: cuInit, pid: 22366, tid: 22366, flags: 0
Could not find the bitsandbytes CUDA binary at PosixPath('/mnt_new/zhumuzhi.zmz/code/bitsandbytes-main/bitsandbytes/libbitsandbytes_cuda121.so')
Could not load bitsandbytes native library: /mnt_new/zhumuzhi.zmz/code/bitsandbytes-main/bitsandbytes/libbitsandbytes_cpu.so: cannot open shared object file: No such file or directory
Traceback (most recent call last):
File "/mnt_new/zhumuzhi.zmz/code/bitsandbytes-main/bitsandbytes/cextension.py", line 104, in
CUDA Setup failed despite CUDA being available. Please run the following command to get more information:
python -m bitsandbytes
Inspect the output of the command and see if you can locate CUDA libraries. You might need to add them to your LD_LIBRARY_PATH. If you suspect a bug, please take the information from python -m bitsandbytes and open an issue at: https://github.com/TimDettmers/bitsandbytes/issues
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++ BUG REPORT INFORMATION ++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++ OTHER +++++++++++++++++++++++++++
CUDA specs: CUDASpecs(highest_compute_capability=(8, 0), cuda_version_string='121', cuda_version_tuple=(12, 1))
PyTorch settings found: CUDA_VERSION=121, Highest Compute Capability: (8, 0).
Library not found: /mnt_new/zhumuzhi.zmz/code/bitsandbytes-main/bitsandbytes/libbitsandbytes_cuda121.so. Maybe you need to compile it from source?
If you compiled from source, try again with make CUDA_VERSION=DETECTED_CUDA_VERSION
,
for example, make CUDA_VERSION=113
.
The CUDA version for the compile might depend on your conda install, if using conda.
Inspect CUDA version via conda list | grep cuda
.
To manually override the PyTorch CUDA version please see: https://github.com/TimDettmers/bitsandbytes/blob/main/docs/source/nonpytorchcuda.mdx
The directory listed in your path is found to be non-existent: /home/hadoop/hadoop-data/hadoop-logs
The directory listed in your path is found to be non-existent: //172.16.0.1
The directory listed in your path is found to be non-existent: /hadoop-client/bin/hadoop
The directory listed in your path is found to be non-existent: /root/.tnvm/versions/alinode/v5.20.3/lib/node
The directory listed in your path is found to be non-existent: /root/.local/share/jupyter/ime
The directory listed in your path is found to be non-existent: ErrorFile=/tmp/hs_errpid%p.log
The directory listed in your path is found to be non-existent: reg.docker.alibaba-inc.com/aii/aistudio
The directory listed in your path is found to be non-existent: "/home/admin/logs","Taglist"
The directory listed in your path is found to be non-existent: /var/run/argo/progress
The directory listed in your path is found to be non-existent: //aistudioproxy.alipay.com/proxy/workflow_48860132
The directory listed in your path is found to be non-existent: /dev/shm/nvidia-mps
The directory listed in your path is found to be non-existent: //service-us.odps.aliyun-inc.com/api
The directory listed in your path is found to be non-existent: //service-us.odps.aliyun-inc.com/api
The directory listed in your path is found to be non-existent: /opt/conda/lib/python3.8/site-packages/aistudio_common/reader/libs/penrose-1.0-SNAPSHOT-jar-with-dependencies.jar
The directory listed in your path is found to be non-existent: %s\007" "${USER}" "${HOSTNAME%%.*}" "${PWD/#$HOME/~}";sh /etc/sysconfig/bash-prompt-history
The directory listed in your path is found to be non-existent: //service-us.odps.aliyun-inc.com/api
The directory listed in your path is found to be non-existent: //pangu1_analyze_sata_em14_online/pai/aistudio/checkpoint/aistudio-150459928
The directory listed in your path is found to be non-existent: //172.16.0.1
The directory listed in your path is found to be non-existent: /ossfs/.param.conf
Found duplicate CUDA runtime files (see below).
We select the PyTorch default CUDA runtime, which is 12.1,
but this might mismatch with the CUDA version that is needed for bitsandbytes.
To override this behavior set the BNB_CUDA_VERSION=<version string, e.g. 122>
environmental variable.
For example, if you want to use the CUDA version 122, BNB_CUDA_VERSION=122 python ...
OR set the environmental variable in your .bashrc: export BNB_CUDA_VERSION=122
In the case of a manual override, make sure you set LD_LIBRARY_PATH, e.g. export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-11.2,
For source installations, compile the binaries with cmake -DCOMPUTE_BACKEND=cuda -S .
.
See the documentation for more details if needed.
Trying a simple check anyway, but this will likely fail... Traceback (most recent call last): File "/mnt_new/zhumuzhi.zmz/code/bitsandbytes-main/bitsandbytes/diagnostics/main.py", line 66, in main sanity_check() File "/mnt_new/zhumuzhi.zmz/code/bitsandbytes-main/bitsandbytes/diagnostics/main.py", line 40, in sanity_check adam.step() File "/opt/conda/lib/python3.8/site-packages/torch/optim/optimizer.py", line 373, in wrapper out = func(*args, kwargs) File "/opt/conda/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, *kwargs) File "/mnt_new/zhumuzhi.zmz/code/bitsandbytes-main/bitsandbytes/optim/optimizer.py", line 287, in step self.update_step(group, p, gindex, pindex) File "/opt/conda/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(args, kwargs) File "/mnt_new/zhumuzhi.zmz/code/bitsandbytes-main/bitsandbytes/optim/optimizer.py", line 500, in update_step F.optimizer_update_32bit( File "/mnt_new/zhumuzhi.zmz/code/bitsandbytes-main/bitsandbytes/functional.py", line 1588, in optimizer_update_32bit optim_func = str2optimizer32bit[optimizer_name][0] NameError: name 'str2optimizer32bit' is not defined Above we output some debug information. Please provide this info when creating an issue via https://github.com/TimDettmers/bitsandbytes/issues/new/choose WARNING: Please be sure to sanitize sensitive info from the output before posting it. XGPU-lite: L-256:JobId:107729472 pid:22366 disconncect XGPU service ... XGPU-lite: L-79:------------------------- XGPU-lite: L-80:Destroy xgpu_client_t XGPU-lite: L-81:-------------------------
run "python -m bitsandbytes"
Traceback (most recent call last): File "/xxx/venv/lib/python3.10/site-packages/bitsandbytes/diagnostics/main.py", line 66, in main sanity_check() File "/xxx/venv/lib/python3.10/site-packages/bitsandbytes/diagnostics/main.py", line 40, in sanity_check adam.step() File "/xxx/venv/lib/python3.10/site-packages/torch/optim/optimizer.py", line 391, in wrapper out = func(*args, kwargs) File "/xxx/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, *kwargs) File "/x/venv/lib/python3.10/site-packaxxges/bitsandbytes/optim/optimizer.py", line 287, in step self.update_step(group, p, gindex, pindex) File "/xxx/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(args, kwargs) File "/xxx/venv/lib/python3.10/site-packages/bitsandbytes/optim/optimizer.py", line 496, in update_step F.optimizer_update_32bit( File "/x/venv/lib/python3.10/site-packagxxes/bitsandbytes/functional.py", line 1584, in optimizer_update_32bit optim_func = str2optimizer32bit[optimizer_name][0] NameError: name 'str2optimizer32bit' is not defined