casper-hansen / AutoAWQ

AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:
https://casper-hansen.github.io/AutoAWQ/
MIT License
1.61k stars 189 forks source link

raise Exception (the loss increases to NAN ) when quantilizing DeepSeek-V2-chat using the new version of AutoAWQ in the sub-iteration (18/60) #535

Open BinFuPKU opened 2 months ago

BinFuPKU commented 2 months ago

In the start stage, it runs well!

It raises an exception when the quantilization process reaches 30% (quantilizing DeepSeek-V2-chat with AutoAWQ about 1~2 hours)

We can find the loss increases to NAN in this sub-iteration (18/60), which is abnormal.

` AWQ: 30%|███ | 18/60 [4:07:51<10:14:05, 877.27s/it] ... Computing Loss (loss: nan): 0%| | 0/1 [00:00<?, ?it/s]

                                                             

Grid Search (Best: -1): 90%|█████████ | 18/20 [00:19<00:02, 1.02s/it]

Grid Search (Best: -1): 95%|█████████▌| 19/20 [00:19<00:01, 1.02s/it]

Computing Loss: 0%| | 0/1 [00:00<?, ?it/s]

Computing Loss (loss: nan): 0%| | 0/1 [00:00<?, ?it/s]

                                                             

Grid Search (Best: -1): 95%|█████████▌| 19/20 [00:20<00:01, 1.02s/it]

Grid Search (Best: -1): 100%|██████████| 20/20 [00:20<00:00, 1.02s/it]

                                                                   

AWQ: 30%|███ | 18/60 [4:08:27<9:39:44, 828.20s/it] Traceback (most recent call last): File "/home/xiaoi/dq/fubin/alignment/Quantization.py", line 14, in model.quantize(tokenizer, quant_config=quant_config) File "/opt/nlp/anaconda3/envs/moe_new_2/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(args, kwargs) File "/home/xiaoi/dq/download/AutoAWQ-main/awq/models/base.py", line 230, in quantize self.quantizer.quantize() File "/home/xiaoi/dq/download/AutoAWQ-main/awq/quantize/quantizer.py", line 166, in quantize scales_list = [ File "/home/xiaoi/dq/download/AutoAWQ-main/awq/quantize/quantizer.py", line 167, in self._search_best_scale(self.modules[i], layer) File "/opt/nlp/anaconda3/envs/moe_new_2/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(args, **kwargs) File "/home/xiaoi/dq/download/AutoAWQ-main/awq/quantize/quantizer.py", line 332, in _search_best_scale best_scales = self._compute_best_scale( File "/home/xiaoi/dq/download/AutoAWQ-main/awq/quantize/quantizer.py", line 413, in _compute_best_scale raise Exception Exception `

maybe there is a bug in AwqQuantizer class?

casper-hansen commented 1 month ago

Hi @BinFuPKU, thanks for raising the issue. I will need to further investigate what causes this, but I can see it will not be easy to debug since the model is so large. Do you have any smaller models that you have observed NaN values in loss on?

WanBenLe commented 1 month ago

For large parameter LLMs, increasing the length of individual text in the calibration dataset can help avoid this problem, but it is still recommended to adjust the calibration data to fit the model.

Kk1984up commented 1 month ago

i got the same issue when quantiliziing qwen2-7b-chat using version 0.2.3 autoawq。how to fix it ?

casper-hansen commented 1 month ago

@Kk1984up try upgrading to the newest version