OpenGVLab / OmniQuant

[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.
MIT License
626 stars 49 forks source link

[quantizer] add Odyssey-style symmetric quantization #56

Closed xingchensong closed 6 months ago

xingchensong commented 6 months ago

What does this PR do?

Implement an OdysseyLLM-style symmetric quantization which disables the zero_point, offering greater hardware efficiency compared to the current version.

ref: https://arxiv.org/pdf/2311.09550v1.pdf

current version: image

Odyssey version: image

Benchmark (W4A8, W per-channel, A per-token)

Model: Llama-2-7b-chat

calibration dataset PPL (wiki2) PPL (ptb) PPL (c4) additional args
NONE (fp16) 7.076 28.138 - -
wiki2 7.456 51.077 - -
ptb 7.638 30.797 9.648 -
mix (wiki2 + ptb + c4) 7.485 33.096 9.487 -
mix (wiki2 + ptb + c4) 7.575 33.673 9.550 --symmetric
mix (wiki2 + ptb + c4) 7.577 32.644 9.522 --symmetric --disable_zero_point

Reproduce

# https://github.com/OpenGVLab/OmniQuant/issues/37
CUDA_VISIBLE_DEVICES=0 python main.py \
  --model /jfs-hdfs/user/xingchen.song/share/LLM/Llama-2-7b-chat --eval_ppl \
  --epochs 60 --output_dir ./log/Llama-2-7b-chat-w4a8-ep60-mix-sym-odyssey \
  --wbits 4 --abits 8 --lwc --aug_loss --deactive_amp \
  --let --let_lr 1e-3 --alpha 0.75 \
  --calib_dataset mix --symmetric --disable_zero_point
xingchensong commented 6 months ago

cc @ChenMnZ . BTW, I am eager to replicate OdysseyLLM (Omniquant + GPTQ, refer to sections 5.1 & 5.2) using this repository and will submit a PR upon completion.

ChenMnZ commented 6 months ago

@xingchensong Thanks for your contribution about the symmetric quantization.

And also looking forward to your reproduction about OdysseyLLM.