[ ] 1. I have searched related issues but cannot get the expected help.
[ ] 2. The bug has not been fixed in the latest version.
[ ] 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
Checklist
Describe the bug
我使用lmdeploy将Internvl2-8B模型用awq量化成INT4,用同一个query对量化前后模型进行推理,量化前
generate_token_len=155
,推理耗时4s;量化后generate_token_len=134
,推理耗时29s,慢了7倍,这是正常的吗?Reproduction
Environment
Error traceback
No response