feat: adapt GPTQ to fp4 quantization

efeslab / Atom

[MLSys'24] Atom: Low-bit Quantization for Efficient and Accurate LLM Serving

277 stars 24 forks source link

Closed happierpig closed 7 months ago

happierpig commented 7 months ago

This PR integrates FP4 quantization (non-uniform quant) into GPTQ codebase. Atom can apply FP4 quant on weight quantization now.

cylinbao commented 7 months ago

LGTM.