efeslab / Atom

[MLSys'24] Atom: Low-bit Quantization for Efficient and Accurate LLM Serving
277 stars 24 forks source link

feat: add FP4 evaluations #11

Closed happierpig closed 7 months ago

happierpig commented 7 months ago

This PR introduces the following enhancements:

  1. Integret new data format support for Atom, e.g., FP4 (https://www.opencompute.org/documents/ocp-microscaling-formats-mx-v1-0-spec-final-pdf). Utilize BitsandBytes for quantization (https://github.com/TimDettmers/bitsandbytes).
  2. Polish and add more comments in codes. Polish the README.md.