PygmalionAI / aphrodite-engine

PygmalionAI's large-scale inference engine
https://pygmalion.chat
GNU Affero General Public License v3.0
606 stars 78 forks source link

feat: FP8 E4M3 KV Cache #405

Closed AlpinDale closed 1 month ago

AlpinDale commented 1 month ago

Similar to the INT8 KV Cache, this PR adds scaled FP8_e4m3 KV cache for both NVIDIA (through AMMO) and AMD (through AMD quantizer). More details later.