Add ICLR24 spotlight paper OmniQuant.

AIoT-MLSys-Lab / Efficient-LLMs-Survey

[TMLR 2024] Efficient Large Language Models: A Survey

https://arxiv.org/abs/2312.03863

1.03k stars 86 forks source link

Closed ChenMnZ closed 7 months ago

ChenMnZ commented 7 months ago

OmniQuant is a PTQ method that supports both weight-only quantization and weight-activation quantization.