RuntimeError: Unsupported compute type Float

ChuanhongLi commented 3 months ago

使用 decoupleQ 量化了一个 llama 33B的模型，推理时报错

Traceback (most recent call last):
  File "/mnt/afs/quantization/test/decoupleQ/llama.py", line 476, in <module>
    model_output = model.generate(input_token_ids_tensor, max_length=40, do_sample=False)
...
File "/mnt/afs/quantization/test/decoupleQ/decoupleQ/linear_w2a16.py", line 36, in forward
    output = dQ_asymm_qw2_gemm(input, self.weight, self.scale, self.zp, self.bias, self.group_size)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Unsupported compute type Float

请问你们是否有遇到过该问题？

谢谢！

MyPandaShaoxiang commented 3 months ago

@ChuanhongLi 是否中间推理出现了输入全为nan的结果

ChuanhongLi commented 3 months ago

@ChuanhongLi 是否中间推理出现了输入全为nan的结果

不是很确定，现在量化的模型，都有些问题 https://github.com/bytedance/decoupleQ/issues/8

bytedance / decoupleQ

RuntimeError: Unsupported compute type Float #9