[Feature] W4A8-FP8 support in AWQ quantization

InternLM / lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

https://lmdeploy.readthedocs.io/en/latest/

Apache License 2.0

4.75k stars 432 forks source link

[Feature] W4A8-FP8 support in AWQ quantization #2766

Open yongchaoding opened 2 weeks ago

yongchaoding commented 2 weeks ago

Motivation

as we all know that lmdelopy runs fastest in awq w4a16, however, as fp8 is used in lots of place. so i wonder, if developers has any plan to develop a fastest w4a8-fp8 kernel in lmdeploy?

Related resources

No response

Additional context

No response

dingjingzhen commented 1 week ago

lzhangzz commented 1 week ago

I will start the work on W8A8 after my current work is done. W4A8 should come after W8A8.