InternLM / lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
https://lmdeploy.readthedocs.io/en/latest/
Apache License 2.0
4.33k stars 390 forks source link

[Feature] Support QuaRot quantization scheme #1489

Open serser opened 5 months ago

serser commented 5 months ago

Motivation

QuaRot is out https://arxiv.org/abs/2404.00456 for three weeks. Preliminary results are convincing. Also see discussions in llama.cpp with the QuaRot authors. It would be amazing to have it supported in LMDeploy as default.

Best.

Related resources

https://github.com/ggerganov/llama.cpp/issues/6444 https://arxiv.org/abs/2404.00456

Additional context

No response

lvhan028 commented 5 months ago

@pppppM @AllentDan @lzhangzz may investigate QuaRot quantization algorithm, very promising