Closed yuyu990116 closed 3 weeks ago
Hi,
Qwen1.0 is no longer actively maintained; please consider upgrading to Qwen2.
RuntimeError: Expected attn_mask dtype to be bool or to match query dtype, but got attn_mask.dtype: c10::BFloat16 and query.dtype: c10::Half instead
indicates dtype mismatch. As quantized models should run in fp16 (torch.half or torch.float16), consider loading the model in fp16. For Qwen1.0, you should pass fp16=True
to AutoModelForCausalLM.from_pretrained
. For Qwen1.5/Qwen2, you should pass torch_dtype=torch.float16
to AutoModelForCausalLM.from_pretrained
.
是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?
该问题是否在FAQ中有解答? | Is there an existing answer for this in FAQ?
当前行为 | Current Behavior
After running run_gptq.py with Qwen-14B-Chat model, I tried "model.chat", but I got this error: RuntimeError: Expected attn_mask dtype to be bool or to match query dtype, but got attn_mask.dtype: c10::BFloat16 and query.dtype: c10::Half instead.
I don't know why and how to fix it.
期望行为 | Expected Behavior
No response
复现方法 | Steps To Reproduce
No response
运行环境 | Environment
备注 | Anything else?
No response