InternLM / lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
https://lmdeploy.readthedocs.io/en/latest/
Apache License 2.0
3.11k stars 280 forks source link

Maybe a workaround for qwen2 quantization Nan error #1844

Open AllentDan opened 4 days ago

zhyncs commented 4 days ago

ref AutoAWQ implementation

https://github.com/casper-hansen/AutoAWQ/blob/c53cc7e8cf65747bab526a3c9e9ee37e580b8c39/awq/quantize/quantizer.py#L257-L259

it uses 1e-6

AllentDan commented 4 days ago

ref AutoAWQ implementation

https://github.com/casper-hansen/AutoAWQ/blob/c53cc7e8cf65747bab526a3c9e9ee37e580b8c39/awq/quantize/quantizer.py#L257-L259

it uses 1e-6

I just found it got updated since my locally downloaded AutoAWQ did not use +1e-6

AllentDan commented 3 days ago

ref AutoAWQ implementation

https://github.com/casper-hansen/AutoAWQ/blob/c53cc7e8cf65747bab526a3c9e9ee37e580b8c39/awq/quantize/quantizer.py#L257-L259

it uses 1e-6

Sorry, I found it should be https://github.com/casper-hansen/AutoAWQ/blob/c53cc7e8cf65747bab526a3c9e9ee37e580b8c39/awq/quantize/quantizer.py#L325 that need to be clamped. And there should be some other places in AutoAWQ that also need updating.