Failed to inference on ChatGLM (torch_musa 1.10, driver 2.6.0 dev, gpu S80)

MooreThreads / torch_musa

torch_musa is an open source repository based on PyTorch, which can make full use of the super computing power of MooreThreads graphics cards.

Other

278 stars 17 forks source link

Failed to inference on ChatGLM (torch_musa 1.10, driver 2.6.0 dev, gpu S80) #42

Open 1someone1 opened 4 months ago

1someone1 commented 4 months ago

Flash attention only supports architecture with mp version 2.2. But now attempts to run on a mp 2.1 gpu. (Triggered internally at /home/torch_musa/torch_musa/csrc/aten/ops/attention/mudnn/SDPUtils.h:28.) return func(*args, **kwargs) muDNN(v2400) 2024-03-27 21:57:33.445104 0d:0h:0m:20s TID=0xbea59ff44d8e6e9f GPU=0 Handle=0x7b6d170 ERROR# NOT_SUPPORTED in Reduce::Run, Reason: The reduce mode(PROD) not support input dtype(BOOL) & output dtype(INT64)

hanhaowen-mt commented 4 months ago

算子类型不支持，可以打印调用栈，用cpu运行该算子