Closed AlpinDale closed 1 month ago
Similar to the INT8 KV Cache, this PR adds scaled FP8_e4m3 KV cache for both NVIDIA (through AMMO) and AMD (through AMD quantizer). More details later.
Similar to the INT8 KV Cache, this PR adds scaled FP8_e4m3 KV cache for both NVIDIA (through AMMO) and AMD (through AMD quantizer). More details later.