bytedance / decoupleQ

A quantization algorithm for LLM
Apache License 2.0
99 stars 5 forks source link

RuntimeError: probability tensor contains either `inf`, `nan` or element < 0 #8

Open ChuanhongLi opened 4 months ago

ChuanhongLi commented 4 months ago

使用 decoupleQ 量化了一个 Llama-2-7b-hf 模型,使用 run_inference_llama.sh 进行推理时,报错如下:

Traceback (most recent call last):
  File "/mnt/afs/quantization/test/decoupleQ/llama.py", line 476, in <module>
    model_output = model.generate(input_token_ids_tensor, max_length=40)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/miniconda3/envs/decoupleQ/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/miniconda3/envs/decoupleQ/lib/python3.11/site-packages/transformers/generation/utils.py", line 1758, in generate
    result = self._sample(
             ^^^^^^^^^^^^^
  File "/usr/local/lib/miniconda3/envs/decoupleQ/lib/python3.11/site-packages/transformers/generation/utils.py", line 2437, in _sample
    next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

通过添加do_sample=False

model_output = model.generate(input_token_ids_tensor, max_length=40, do_sample=False)

推理可以继续运行,但是输出直接乱掉:

out_text: ['<s> who are you?<unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk>']
inference speed: e2e 1007.3947906494141 ms, pertoken 25.18486976623535 ms

请问下,你们有遇到这种情况吗?是否有对应的解决办法? 谢谢!

MyPandaShaoxiang commented 4 months ago

是什么卡推理的,看起来是推理过程出现有nan

ChuanhongLi commented 4 months ago

是什么卡推理的,看起来是推理过程出现有nan

用的 A100-SXM4-80GB

ChuanhongLi commented 4 months ago

我换到 4090(GeForce RTX 4090 ) 上也是同样的问题

MyPandaShaoxiang commented 4 months ago

是用的原生的量化脚本进行量化的吗,量化结果的ppl有问题吗

ChuanhongLi commented 4 months ago

是用的原生的量化脚本进行量化的吗,量化结果的ppl有问题吗

直接用的 run_llama.sh 脚本(修改下模型路径)

 =====The ppl of c4 is 11.006427764892578, logPPL is 2.398479461669922
GuoYi0 commented 4 months ago

是用的原生的量化脚本进行量化的吗,量化结果的ppl有问题吗

直接用的 run_llama.sh 脚本(修改下模型路径)

 =====The ppl of c4 is 11.006427764892578, logPPL is 2.398479461669922

ppl看上去问题不大

MyPandaShaoxiang commented 4 months ago

方便给一份pt文件的链接么,real_quant和fake_quant的,我们看一下

ChuanhongLi commented 4 months ago

方便给一份pt文件的链接么,real_quant和fake_quant的,我们看一下

文件在内网,不允许拷贝;模型啥的都是开源版本,没做任何修改,运行脚本也仅改了模型的路径

Pydataman commented 1 month ago

请问问题解决了么 我用这个量化了一个TTS大模型的模型 也出现了这个