casper-hansen / AutoAWQ

AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:
https://casper-hansen.github.io/AutoAWQ/
MIT License
1.79k stars 217 forks source link

Please add longer calibration length to backlog.. #376

Open dustyatx opened 9 months ago

dustyatx commented 9 months ago

I've seen the other issues where others have called out 512 custom calibration limit issue. I get that was inherited from the source project. Hopefully you can add this to the backlog since it eliminates a lot of real world use cases.

On a fine-tuned task quality falls off a cliff with the default calibration set and since most use cases I'm seeing require a much longer output than 512, skipping those examples eliminates my ability to quantize using custom calibration data..

I'm very willing to have my quantization take many times longer if needed.. an accurate model is more important then the time it takes to quantize it.

It's a wonderfully useful project otherwise.. Super easy to use.. =D

casper-hansen commented 9 months ago

Hi @dustyatx, I agree it would be nice to have a more flexible calibration method. Currently, I do not have a plan to modify it as I am filled with other work. I welcome any contributions though, I think the dataset preparation needs a complete rework

dustyatx commented 9 months ago

I totally understand.. that's what a backlog is for, the work that needs to be done; doesn't mean you can do it anytime soon. At the very least it's documenting the issue. I wish I could contribute at this level but unfortunately I'm more a of designer and builder not a data scientist.

I've noticed that there has been debate in the Reddit forums and other places where some people have a great experience with quantized models and others don't. Now that i understand this 512 token limitation it makes sense that people using them for chat bots are having a great experience and people who are trying to use the model for NLP tasks have a lot accuracy issues (mine tasks hit > 45% error rates).