Facico / Chinese-Vicuna

Chinese-Vicuna: A Chinese Instruction-following LLaMA-based Model —— 一个中文低资源的llama+lora方案,结构参考alpaca
https://github.com/Facico/Chinese-Vicuna
Apache License 2.0
4.14k stars 422 forks source link

bash chat.sh报错,看有人遇到过 #64

Open xiaoaidafu opened 1 year ago

xiaoaidafu commented 1 year ago

环境: Ubuntu 20 显卡 M40 24G 内存 64G CPU Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz

安装过程中发生过如下报错:libbitsandbytes_cpu.so: undefined symbol: cget_col_row_stats 通过将libbitsandbytes_cuda117.so 拷贝命名成libbitsandbytes_cpu.so后解决。

但是运行时出现如下错误: $ bash chat.sh /usr/lib/python3/dist-packages/requests/init.py:89: RequestsDependencyWarning: urllib3 (1.26.15) or chardet (3.0.4) doesn't match a supported version! warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "

===================================BUG REPORT=================================== Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues

bin /usr/local/lib/python3.8/dist-packages/bitsandbytes/libbitsandbytes_cpu.so CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths... /usr/local/lib/python3.8/dist-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/usr/local/cuda/lib64')} warn(msg) /usr/local/lib/python3.8/dist-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: WARNING: No libcudart.so found! Install CUDA or the cudatoolkit package (anaconda)! warn(msg) CUDA SETUP: Highest compute capability among GPUs detected: 5.2 CUDA SETUP: Detected CUDA version 117 /usr/local/lib/python3.8/dist-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: WARNING: Compute capability < 7.5 detected! Only slow 8-bit matmul is supported for your GPU! warn(msg) CUDA SETUP: Loading binary /usr/local/lib/python3.8/dist-packages/bitsandbytes/libbitsandbytes_cpu.so... The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization. The tokenizer class you load from this checkpoint is 'LLaMATokenizer'. The class this function is called from is 'LlamaTokenizer'. Chinese-Vicuna/Chinese-Vicuna-lora-7b-belle-and-guanaco/adapter_model.bin Loading checkpoint shards: 0%| | 0/33 [00:00<?, ?it/s] Error named symbol not found at line 530 in file /home/tim/git/bitsandbytes/csrc/ops.cu

Facico commented 1 year ago

你可以参考一下这个issue

Facico commented 1 year ago

@xiaoaidafu 你可以试试我们最新的chat.sh,原来的那个脚本会有一点问题,见这个issue

xiaoaidafu commented 1 year ago

@Facico ok!

xiaoaidafu commented 1 year ago

@xiaoaidafu 你可以试试我们最新的chat.sh,原来的那个脚本会有一点问题,见这个issue

我采用了源码编译bitsandbytes的方法,并且将编译好的so文件拷贝到对于的目录,结果启动的时候还是报错了 错误如下:

bash chat.sh

/usr/lib/python3/dist-packages/requests/init.py:89: RequestsDependencyWarning: urllib3 (1.26.15) or chardet (3.0.4) doesn't match a supported version! warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "

===================================BUG REPORT=================================== Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues

bin /usr/local/lib/python3.8/dist-packages/bitsandbytes/libbitsandbytes_cpu.so CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths... CUDA SETUP: WARNING! libcuda.so not found! Do you have a CUDA driver installed? If you are on a cluster, make sure you are on a CUDA machine! CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so /usr/local/lib/python3.8/dist-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: WARNING: No GPU detected! Check your CUDA paths. Proceeding to load CPU-only library... warn(msg) CUDA SETUP: Loading binary /usr/local/lib/python3.8/dist-packages/bitsandbytes/libbitsandbytes_cpu.so... The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization. The tokenizer class you load from this checkpoint is 'LLaMATokenizer'. The class this function is called from is 'LlamaTokenizer'. Chinese-Vicuna/Chinese-Vicuna-lora-7b-belle-and-guanaco/adapter_model.bin Loading checkpoint shards: 0%| | 0/33 [00:00<?, ?it/s]Error named symbol not found at line 528 in file /opt/github_src/bitsandbytes/csrc/ops.cu

不过这个很明显,有不一样的地方,这次路径变成了我源码编译的路径: /opt/github_src/bitsandbytes/csrc/ops.cu

Facico commented 1 year ago

你现在这个问题和其他一些issue问题类似,如issue1, issue2,问题应该出在你的GPU环境上。

dcaczg commented 1 year ago

@xiaoaidafu 我和你硬件一样 M40 24G / CPU E5-2680 v4 在第一步 undefined symbol: cget_col_row_stats 报错时,下载安装 CUDA 11.6 后解决。

twang2218 commented 1 year ago

我也碰到了这个问题,下面是我观察的一些现象。

我对 ~/miniconda/envs/vicuna/lib/python3.10/site-packages/bitsandbytes 目录下所有的 .so 检查了一下其内部是否存在符号 cget_col_row_stats

for file in *.so; do
     if nm "$file" | grep -q "cget_col_row_stats"; then
         echo "- [✅] $file";
     else
         echo "- [❌]  $file";
     fi;
done

得到的结果如下:

所有的CUDA库都有cget_col_row_stats符号,而CPU的库则没有 cget_col_row_stats 符号。看到有人提示说,直接用cuda对应版本的文件覆盖cpu的文件即可。我尝试了一下,在我的环境是可以解决这个报错的。那说明问题出在了最初判断是否是使用GPU版本上。

执行 python -m bitsandbytes 排障,可以得到下面的信息:

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
/root/miniconda/envs/vicuna/lib/python3.10/site-packages/bitsandbytes/cextension.py:31: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers and GPU quantization are unavailable.
  warn("The installed version of bitsandbytes was compiled without GPU support. "
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++ DEBUG INFORMATION +++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

++++++++++ POTENTIALLY LIBRARY-PATH-LIKE ENV VARS ++++++++++
'CONDA_EXE': '/root/miniconda/bin/conda'
'CONDA_PREFIX': '/root/miniconda/envs/vicuna'
'CONDA_PYTHON_EXE': '/root/miniconda/bin/python'
'CONDA_PREFIX_1': '/root/miniconda'
'CONDA_PREFIX_2': '/root/miniconda/envs/model'
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

WARNING: Please be sure to sanitize sensible info from any such env vars!

++++++++++++++++++++++++++ OTHER +++++++++++++++++++++++++++
COMPILED_WITH_CUDA = False
COMPUTE_CAPABILITIES_PER_GPU = ['8.6']
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++ DEBUG INFO END ++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Running a quick check that:
    + library is importable
    + CUDA function is callable

name 'str2optimizer32bit' is not defined

Above we output some debug information. Please provide this info when creating an issue via https://github.com/TimDettmers/bitsandbytes/issues/new/choose ...

经过查看 bitsandbytes 的代码,可以得知,它搜索判断的是 libcudart.so

https://github.com/TimDettmers/bitsandbytes/blob/0.37.0/bitsandbytes/cuda_setup/main.py#L228-L238

    if "CONDA_PREFIX" in candidate_env_vars:
        conda_libs_path = Path(candidate_env_vars["CONDA_PREFIX"]) / "lib"

        conda_cuda_libs = find_cuda_lib_in(str(conda_libs_path))
        warn_in_case_of_duplicates(conda_cuda_libs)

        if conda_cuda_libs:
            return next(iter(conda_cuda_libs))

        CUDASetup.get_instance().add_log_entry(f'{candidate_env_vars["CONDA_PREFIX"]} did not contain '
            f'{CUDA_RUNTIME_LIB} as expected! Searching further paths...', is_warning=True)

本地查找 libcudart.so

$ find $CONDA_PREFIX | grep libcudart.so
/root/miniconda/envs/vicuna/lib/python3.10/site-packages/nvidia/cuda_runtime/lib/libcudart.so.11.0

由代码可知,bitsandbytes 是按照环境变量查找该文件,比较靠谱的是 CONDA_PREFIXLD_LIBRARY_PATH。但是,由于文件位于更深层次的位置,所以查找失败。

两种解决办法,一种是把文件链接到 $CONDA_PREFIX/lib 目录来;另一种是设置 LD_LIBRARY_PATH 指向库文件所在目录。

第一种设置 LD_LIBRARY_PATH 的办法:

export LD_LIBRARY_PATH=$CONDA_PREFIX/lib/python3.10/site-packages/nvidia/cuda_runtime/lib

这种方式需要重新安装 bitsandbytes 为0.39.0,以及其所需要的 scipy,因为 0.37.0 指挥搜索 libcudart.so,而不会搜索 libcudart.so.11.0

第二种链接的办法:

ln -s $CONDA_PREFIX/lib/python3.10/site-packages/nvidia/cuda_runtime/lib/libcudart.so.11.0 $CONDA_PREFIX/lib/libcudart.so

再次执行 python chat.py,我这里出现了一个新的报错。

libcusparse.so.11: cannot open shared object file: No such file or directory

查找后发现没有该文件,因此需要安装该依赖:

pip install nvidia-cusparse-cu11

需要注明的是,把 torch 改成 2.0.0,则无需额外安装 cusparse。

然后需要同样的链接或者设置环境变量的办法。

总结一下

解决这个问题两种办法。

第一种办法:设置环境变量

pip install bitsandbytes==0.39.0 scipy
pip install nvidia-cusparse-cu11
export LD_LIBRARY_PATH=$CONDA_PREFIX/lib/python3.10/site-packages/nvidia/cuda_runtime/lib:$CONDA_PREFIX/lib/python3.10/site-packages/nvidia/cusparse/lib

第二种办法:链接文件

pip install nvidia-cusparse-cu11
ln -s $CONDA_PREFIX/lib/python3.10/site-packages/nvidia/cuda_runtime/lib/libcudart.so.11.0 $CONDA_PREFIX/lib/libcudart.so
ln -s $CONDA_PREFIX/lib/python3.10/site-packages/nvidia/cusparse/lib/libcusparse.so.11 $CONDA_PREFIX/lib/libcusparse.so.11

然后执行 python chat.py 就没有问题了。