Open hezeli123 opened 3 months ago
<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">
并发数 | norm tokens/s | awq tokens/s -- | -- | -- 1 | 40.93 | 42.06 2 | 62 | 60.52 4 | 79.08 | 73.32 8 | 94.4 | 80.8 16 | 94.4 | 98.88 32 | 100 | 101.44 64 | 123.52 | 119.68
Checklist
Describe the bug
在A10上 ,Mini-InternVL-Chat-2B-V1-5 AWQ量化后推理速度比量化前慢. 从压测效果上看,量化没有提升推理性能,反而会降低一些性能。 同样的测试集推理效果对比:
Reproduction
lmdeploy lite auto_awq Mini-InternVL-Chat-2B-V1-5 --calib-dataset 'ptb' --calib-samples 128 --calib-seqlen 2048 --w-bits 4 --w-group-size 128 --batch-size 1 --search-scale False --work-dir ./Mini-InternVL-Chat-2B-V1-5-awq
lmdeploy serve api_server Mini-InternVL-Chat-2B-V1-5-awq/ --server-port 8000 --model-format awq
Environment
Error traceback
No response