Closed xingchensong closed 6 months ago
cc @ChenMnZ . BTW, I am eager to replicate OdysseyLLM (Omniquant + GPTQ, refer to sections 5.1 & 5.2) using this repository and will submit a PR upon completion.
@xingchensong Thanks for your contribution about the symmetric quantization.
And also looking forward to your reproduction about OdysseyLLM.
What does this PR do?
Implement an OdysseyLLM-style symmetric quantization which disables the zero_point, offering greater hardware efficiency compared to the current version.
ref: https://arxiv.org/pdf/2311.09550v1.pdf
current version:![image](https://github.com/OpenGVLab/OmniQuant/assets/13466943/fdbde5fc-fb62-41ed-8ec3-998108417808)
Odyssey version:![image](https://github.com/OpenGVLab/OmniQuant/assets/13466943/2b5dc020-1b83-4d27-8462-573b83a355ee)
Benchmark (W4A8, W per-channel, A per-token)
Model: Llama-2-7b-chat
Reproduce