Open spongezz opened 11 months ago
Hi @spongezz , could you provide more details on the accuracy difference? There are several difference between our AWQ implementation and llm-awq.
@RalphMao Thank you for your reply! I will recheck and give you feed back soon.
Hi @spongezz , could you provide more details on the accuracy difference? There are several difference between our AWQ implementation and llm-awq.
- lm_head is quantized in ammo (this causes accuracy drop on some models, so we just disabled it in the most recent release last weekend)
- ammo uses symmetric quantization instead of the asymmetric quantization in llm-awq, which will cause slight more accuracy drop
- llm-awq is a combination of awq scale and clipping while ammo by default only runs awq scale for fast quantization
Same problem. There is a big difference between the score of awq and the score of fp16. Are there any other parameters that can be adjusted?
@RalphMao Thank you for your reply! I will recheck and give you feed back soon.
Is your problem solved?
@RalphMao Thank you for your reply! I will recheck and give you feed back soon.
Is your problem solved?
Not yet. I tried using full quantization but it was too slow. I don't know whether it becomes faster in the lastest commit. I just use the int 8 quant to meet my needs. Would you please let me know if you solve the problem?
@RalphMao Thank you for your reply! I will recheck and give you feed back soon.
Is your problem solved?
Not yet. I tried using full quantization but it was too slow. I don't know whether it becomes faster in the lastest commit. I just use the int 8 quant to meet my needs. Would you please let me know if you solve the problem?
I now use GPTQ instead of AWQ and the accuracy is acceptable. https://github.com/NVIDIA/TensorRT-LLM/blob/main/examples/llama/README.md#gptq
@spongezz Do you still have the problem? If not, we will close it soon.
If I am not mistaken, the awq implemented in ammo uses a default alpha_step = 0.1 to search the parameter. However, the model quantized by ammo have a larger performance reduction than AWQ. Is ammo opensource so that I can check where I make mistake or not?