bytedance / decoupleQ

A quantization algorithm for LLM
Apache License 2.0
94 stars 5 forks source link

run_inference_llama 问题请教 #5

Closed ChuanhongLi closed 3 months ago

ChuanhongLi commented 3 months ago

我们根据 run inference demo 指令执行 bash build.sh,未见报错,应该是构建成功了; 但是在执行 bash run_inference_llama.sh xx xx 时,碰到了如下问题: from .decoupleQ_kernels import dQ_preprocess_weights_int2_for_weight_only, dQ_asymm_qw2_gemm ImportError: /mnt/afs/quantization/test/decoupleQ/decoupleQ/decoupleQ_kernels.so: undefined symbol: _ZN3c104impl3cow11cow_deleterEPv

日志如下:

[ 75%] Linking CUDA device code CMakeFiles/decoupleQ_kernels.dir/cmake_device_link.o
[ 83%] Linking CXX shared library libdecoupleQ_kernels.so
[ 83%] Built target decoupleQ_kernels
+ cp libdecoupleQ_kernels.so ../../decoupleQ/decoupleQ_kernels.so
+ cd ../../
(test) root@app-d2tpwnbl-74969bd46f-46nzl:/data/quantization/test/decoupleQ# 

Traceback (most recent call last):
  File "/mnt/afs/quantization/test/decoupleQ/llama.py", line 21, in <module>
    from decoupleQ.linear_w2a16 import LinearW2A16, LinearA16
  File "/mnt/afs/quantization/test/decoupleQ/decoupleQ/linear_w2a16.py", line 7, in <module>
    from .decoupleQ_kernels import dQ_preprocess_weights_int2_for_weight_only, dQ_asymm_qw2_gemm
ImportError: /mnt/afs/quantization/test/decoupleQ/decoupleQ/decoupleQ_kernels.so: undefined symbol: _ZN3c104impl3cow11cow_deleterEPv
(test) root@app-d2tpwnbl-74969bd46f-46nzl:/data/quantization/test/decoupleQ#

请问下,这个得如何解决一下?谢谢!

ChuanhongLi commented 3 months ago

您好,,麻烦问下,这个推理对于卡、cuda版本、torch版本等的有要求吗?能否提供一个可以直接用于推理的镜像?