quantize models with large context

casper-hansen / AutoAWQ

AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:

https://casper-hansen.github.io/AutoAWQ/

MIT License

1.43k stars 163 forks source link

quantize models with large context #492

Open chennnM opened 1 month ago

chennnM commented 1 month ago

I want to quantize the CodeQwen model using a custom dataset, but all sample lengths exceed 512. Why doesn't AWQ support sample with lengths longer than 512? Are there any alternative methods for quantizing models with large context?

WanBenLe commented 1 month ago

You can see my github AutoAWQ-with-quantizer(with autoawq==0.24) code to change quantizer. https://github.com/WanBenLe/AutoAWQ-with-quantizer

casper-hansen commented 1 month ago

This is on the roadmap as the next development. I want to rework how calibration is executed completely and document it.

edbeeching commented 1 month ago

I just spotted this issue as well, inputs to block size are capped at 512, regardless of block size provided: https://github.com/casper-hansen/AutoAWQ/blob/5f3785dcaa107ca76f5fa5355f459370c86f82d6/awq/utils/calib_data.py#L50