PygmalionAI / aphrodite-engine

PygmalionAI's large-scale inference engine
https://pygmalion.chat
GNU Affero General Public License v3.0
606 stars 78 forks source link

[Feature]: Support hqq quantize method. #418

Open Minami-su opened 1 month ago

Minami-su commented 1 month ago

🚀 The feature, motivation and pitch

https://mobiusml.github.io/hqq_blog/

HQQ is a fast and accurate model quantizer that skips the need for calibration data. It's super simple to implement (just a few lines of code for the optimizer). It can crunch through quantizing the Llama2-70B model in only 4 minutes! 🚀

Alternatives

No response

Additional context

No response