AutoGPTQ / AutoGPTQ

An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
MIT License
4.34k stars 463 forks source link

What magnitude of avg loss indicates a relatively good result for a quantization model #649

Closed ehuaa closed 2 months ago

ehuaa commented 4 months ago

When i quantize a model, the avg loss is lower in earlier layers(0.02) than the loss in later layers(2.0), i'm curious that if the quantization is failed due to a large avg loss? And for experience, what magnitude of avg loss is good for a quantization model?

Qubitium commented 4 months ago

My rule of thumb is if your losses are > 1.0 for early [1-3] layers, calibration data is off or tokenizer is not properly configured. Each module in each layer has it's own loss trend in my experience. Some modules just are harder to quantize. MOE models are the worst-case for gptq due the gating/router layer.

ehuaa commented 4 months ago

My rule of thumb is if your losses are > 1.0 for early [1-3] layers, calibration data is off or tokenizer is not properly configured. Each module in each layer has it's own loss trend in my experience. Some modules just are harder to quantize. MOE models are the worst-case for gptq due the gating/router layer.

  • use running quant avg loss as guide to usable quant
  • run ppl after quant for test 1
  • human eval test for test 2

Thanks for your quick reply! @Qubitium My losses are lower than 0.05 in the first three layers as you mentioned above, but will eventually turns to above 10.0 in the last 40 layers, is it normal in your experience? (ps: my model is a finetuned verison of Qwen-72b-chat, which has 80 layers in total.) I'll test the 2 tests you mentioned above after i finish quantizing my model.

ehuaa commented 4 months ago

My rule of thumb is if your losses are > 1.0 for early [1-3] layers, calibration data is off or tokenizer is not properly configured. Each module in each layer has it's own loss trend in my experience. Some modules just are harder to quantize. MOE models are the worst-case for gptq due the gating/router layer.

  • use running quant avg loss as guide to usable quant
  • run ppl after quant for test 1
  • human eval test for test 2

@Qubitium I have finished the two tests you mentioned above, and found that the result for test 1 is reasonable, but the result of human eval test falls about 50% after quantization, do you have any advice to fix it? thanks

Qubitium commented 4 months ago

What is your PPL before and after quantization?

ehuaa commented 4 months ago

What is your PPL before and after quantization?

My PPL before quantization on wiki2 is 5.334, while after quantization the PPL is 5.415, my model is a finetuned version of qwen1.5-72b HumanEval result before quantization is 0.677 while after quantization it drops to 0.372 @Qubitium

Qubitium commented 4 months ago

5.33 for pre-quant PPL is already very suspect in my opinion for such a huge model. Forget quant, troubleshoot your PPL/inference pre-quant. Make sure your PPL is not using same dataset as calibration but real use-case.

ipengx1029 commented 1 month ago

when I quantize llama-7b-hf,why the avg loss so large,is it normal? ![Uploading e6a8f2cd3c84f3fa341f3f38c77f7189.png…]()