QwenLM / Qwen

The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.
Apache License 2.0
12.47k stars 1.01k forks source link

[BUG] <title>RuntimeError: Expected attn_mask dtype to be bool or to match query dtype, but got attn_mask.dtype: c10::BFloat16 and query.dtype: c10::Half instead. #1276

Closed yuyu990116 closed 3 weeks ago

yuyu990116 commented 1 month ago

是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?

该问题是否在FAQ中有解答? | Is there an existing answer for this in FAQ?

当前行为 | Current Behavior

After running run_gptq.py with Qwen-14B-Chat model, I tried "model.chat", but I got this error: RuntimeError: Expected attn_mask dtype to be bool or to match query dtype, but got attn_mask.dtype: c10::BFloat16 and query.dtype: c10::Half instead.

I don't know why and how to fix it.

期望行为 | Expected Behavior

No response

复现方法 | Steps To Reproduce

No response

运行环境 | Environment

- Python: Python 3.10.14
- auto_gptq: '0.8.0.dev0'
- Transformers: '4.41.2'
- PyTorch: '2.2.1'
- CUDA (`python -c 'import torch; print(torch.version.cuda)'`): 11.8

备注 | Anything else?

No response

jklj077 commented 3 weeks ago

Hi,

Qwen1.0 is no longer actively maintained; please consider upgrading to Qwen2.

RuntimeError: Expected attn_mask dtype to be bool or to match query dtype, but got attn_mask.dtype: c10::BFloat16 and query.dtype: c10::Half instead indicates dtype mismatch. As quantized models should run in fp16 (torch.half or torch.float16), consider loading the model in fp16. For Qwen1.0, you should pass fp16=True to AutoModelForCausalLM.from_pretrained. For Qwen1.5/Qwen2, you should pass torch_dtype=torch.float16 to AutoModelForCausalLM.from_pretrained.