Closed Minami-su closed 6 months ago
Because vllm-gptq does not open issue,so I raise issue here.
https://mobiusml.github.io/hqq_blog/
HQQ is a fast and accurate model quantizer that skips the need for calibration data. It's super simple to implement (just a few lines of code for the optimizer). It can crunch through quantizing the Llama2-70B model in only 4 minutes! 🚀
Hope to use hqq model on vllmgptq.
Sorry, I missed the message. I'll look into later.
Because vllm-gptq does not open issue,so I raise issue here.
https://mobiusml.github.io/hqq_blog/
HQQ is a fast and accurate model quantizer that skips the need for calibration data. It's super simple to implement (just a few lines of code for the optimizer). It can crunch through quantizing the Llama2-70B model in only 4 minutes! 🚀
Hope to use hqq model on vllmgptq.