Open serser opened 5 months ago
QuaRot is out https://arxiv.org/abs/2404.00456 for three weeks. Preliminary results are convincing. Also see discussions in llama.cpp with the QuaRot authors. It would be amazing to have it supported in LMDeploy as default.
llama.cpp
Best.
https://github.com/ggerganov/llama.cpp/issues/6444 https://arxiv.org/abs/2404.00456
No response
@pppppM @AllentDan @lzhangzz may investigate QuaRot quantization algorithm, very promising
Motivation
QuaRot is out https://arxiv.org/abs/2404.00456 for three weeks. Preliminary results are convincing. Also see discussions in
llama.cpp
with the QuaRot authors. It would be amazing to have it supported in LMDeploy as default.Best.
Related resources
https://github.com/ggerganov/llama.cpp/issues/6444 https://arxiv.org/abs/2404.00456
Additional context
No response