THUDM / GLM-4

GLM-4 series: Open Multilingual Multimodal Chat LMs | 开源多语言多模态对话模型
Apache License 2.0
3.28k stars 235 forks source link

2张Tesla P40(24G)能够正常运行glm-4-9b-chat-1m,但Tesla P40(24G)增加到3张时报错 #258

Closed SH0AN closed 2 days ago

SH0AN commented 2 days ago

System Info / 系統信息

GPU: Tesla P40(24G) * 3 System: Windows Server 2022 Standard Cuda: 12.1 Python: 3.9.19 Torch: 2.3.1 Transformers: 4.41.2

Who can help? / 谁可以帮助到您?

No response

Information / 问题信息

Reproduction / 复现过程

设置运行模型: 运行trans_web_demo.py,提交提示语后报错: C:\Users\Administrator.cache\huggingface\modules\transformers_modules\glm-4-9b-chat-1m\modeling_chatglm.py:271: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:455.) context_layer = torch.nn.functional.scaled_dot_product_attention(query_layer, key_layer, value_layer, Exception in thread Thread-6: Traceback (most recent call last): File "D:\anaconda3\envs\sen\lib\threading.py", line 980, in _bootstrap_inner self.run() File "D:\anaconda3\envs\sen\lib\threading.py", line 917, in run self._target(*self._args, *self._kwargs) File "D:\anaconda3\envs\sen\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(args, **kwargs) File "D:\anaconda3\envs\sen\lib\site-packages\transformers\generation\utils.py", line 1758, in generate result = self._sample( File "D:\anaconda3\envs\sen\lib\site-packages\transformers\generation\utils.py", line 2437, in _sample next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1) RuntimeError: probability tensor contains either inf, nan or element < 0

Expected behavior / 期待表现

希望glm-4-9b能正常回答我的问题

zRzRzRzRzRzRzR commented 2 days ago

看上一个issue吧,重复了