[BUG/Help] <RuntimeError: probability tensor contains either `inf`, `nan` or element < 0>

pjw80921 commented 1 year ago

Is there an existing issue for this?

[X] I have searched the existing issues

Current Behavior

加载chatglm-6b-int8时，提问会报错： Traceback (most recent call last): File "cli_demo.py", line 56, in main() File "cli_demo.py", line 38, in main for resp, history in local_doc_qa.get_knowledge_based_answer(query=query, File "/root/langchain-ChatGLM/chains/local_doc_qa.py", line 303, in get_knowledge_based_answer for answer_result in self.llm.generatorAnswer(prompt=prompt, history=chat_history, File "/root/langchain-ChatGLM/models/chatglm_llm.py", line 52, in generatorAnswer for inum, (streamresp, ) in enumerate(self.checkPoint.model.stream_chat( File "/root/anaconda3/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 35, in generator_context response = gen.send(None) File "/root/.cache/huggingface/modules/transformers_modules/chatglm-6b-int8/modeling_chatglm.py", line 1311, in stream_chat for outputs in self.stream_generate(inputs, gen_kwargs): File "/root/anaconda3/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 35, in generator_context response = gen.send(None) File "/root/.cache/huggingface/modules/transformers_modules/chatglm-6b-int8/modeling_chatglm.py", line 1404, in stream_generate next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1) RuntimeError: probability tensor contains either inf, nan or element < 0

Expected Behavior

No response

Steps To Reproduce

下载：https://github.com/imClumsyPanda/langchain-ChatGLM 按照readme.md下载量化版int8的参数模型bin 启动python cli_demo.py 加载完知识库输入问题报错

Environment

- OS: centos
- Python:Py3.8
- Transformers: default
- PyTorch: 
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :

Anything else?

No response

codingfun2022 commented 1 year ago

You probably forgot to call model.half(). I encountered the same error, after I added the model.half() invocation, the error disappeared. The example code in README.md contains the model.half() invocation, though I don't understand why this invocation is always necessary. 可能是由于没有调用 model.half()，我也遇到了这个错误，添加 model.half() 调用后可以了。README.md 里面的示例代码有 model.half()，我也不太明白为啥都得加上这个调用。

For example, the following half() invocation is necessary. 例如，下面的 half() 调用是必须的。

model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True).half().cuda()
#                                                                             ^^^^^^

SCZwangxiao commented 1 year ago

See https://github.com/THUDM/ChatGLM-6B/issues/31#issuecomment-1709335189

THUDM / ChatGLM-6B