GPU: Tesla P40(24G) * 3
System: Windows Server 2022 Standard
Cuda: 12.1
Python: 3.9.19
Torch: 2.3.1
Transformers: 4.41.2
Who can help? / 谁可以帮助到您?
No response
Information / 问题信息
[X] The official example scripts / 官方的示例脚本
[ ] My own modified scripts / 我自己修改的脚本和任务
Reproduction / 复现过程
运行trans_web_demo.py,提交提示语后报错:
C:\Users\Administrator.cache\huggingface\modules\transformers_modules\glm-4-9b-chat-1m\modeling_chatglm.py:271: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:455.)
context_layer = torch.nn.functional.scaled_dot_product_attention(query_layer, key_layer, value_layer,
Exception in thread Thread-6:
Traceback (most recent call last):
File "D:\anaconda3\envs\sen\lib\threading.py", line 980, in _bootstrap_inner
self.run()
File "D:\anaconda3\envs\sen\lib\threading.py", line 917, in run
self._target(*self._args, *self._kwargs)
File "D:\anaconda3\envs\sen\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(args, **kwargs)
File "D:\anaconda3\envs\sen\lib\site-packages\transformers\generation\utils.py", line 1758, in generate
result = self._sample(
File "D:\anaconda3\envs\sen\lib\site-packages\transformers\generation\utils.py", line 2437, in _sample
next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
RuntimeError: probability tensor contains either inf, nan or element < 0
System Info / 系統信息
GPU: Tesla P40(24G) * 3 System: Windows Server 2022 Standard Cuda: 12.1 Python: 3.9.19 Torch: 2.3.1 Transformers: 4.41.2
Who can help? / 谁可以帮助到您?
No response
Information / 问题信息
Reproduction / 复现过程
运行trans_web_demo.py,提交提示语后报错: C:\Users\Administrator.cache\huggingface\modules\transformers_modules\glm-4-9b-chat-1m\modeling_chatglm.py:271: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:455.) context_layer = torch.nn.functional.scaled_dot_product_attention(query_layer, key_layer, value_layer, Exception in thread Thread-6: Traceback (most recent call last): File "D:\anaconda3\envs\sen\lib\threading.py", line 980, in _bootstrap_inner self.run() File "D:\anaconda3\envs\sen\lib\threading.py", line 917, in run self._target(*self._args, *self._kwargs) File "D:\anaconda3\envs\sen\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(args, **kwargs) File "D:\anaconda3\envs\sen\lib\site-packages\transformers\generation\utils.py", line 1758, in generate result = self._sample( File "D:\anaconda3\envs\sen\lib\site-packages\transformers\generation\utils.py", line 2437, in _sample next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1) RuntimeError: probability tensor contains either
inf
,nan
or element < 0Expected behavior / 期待表现
希望glm-4-9b能正常回答我的问题