feat: add FP4 evaluations

efeslab / Atom

[MLSys'24] Atom: Low-bit Quantization for Efficient and Accurate LLM Serving

277 stars 24 forks source link

Closed happierpig closed 7 months ago

happierpig commented 7 months ago

This PR introduces the following enhancements:

Integret new data format support for Atom, e.g., FP4 (https://www.opencompute.org/documents/ocp-microscaling-formats-mx-v1-0-spec-final-pdf). Utilize BitsandBytes for quantization (https://github.com/TimDettmers/bitsandbytes).
Polish and add more comments in codes. Polish the README.md.