bytedance / decoupleQ

A quantization algorithm for LLM
Apache License 2.0
94 stars 5 forks source link

RuntimeError: Unsupported compute type Float #9

Open ChuanhongLi opened 3 months ago

ChuanhongLi commented 3 months ago

使用 decoupleQ 量化了一个 llama 33B的模型,推理时报错

Traceback (most recent call last):
  File "/mnt/afs/quantization/test/decoupleQ/llama.py", line 476, in <module>
    model_output = model.generate(input_token_ids_tensor, max_length=40, do_sample=False)
...
File "/mnt/afs/quantization/test/decoupleQ/decoupleQ/linear_w2a16.py", line 36, in forward
    output = dQ_asymm_qw2_gemm(input, self.weight, self.scale, self.zp, self.bias, self.group_size)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Unsupported compute type Float

请问你们是否有遇到过该问题?

谢谢!

MyPandaShaoxiang commented 3 months ago

@ChuanhongLi 是否中间推理出现了输入全为nan的结果

ChuanhongLi commented 3 months ago

@ChuanhongLi 是否中间推理出现了输入全为nan的结果

不是很确定,现在量化的模型,都有些问题 https://github.com/bytedance/decoupleQ/issues/8