Open KnightYao opened 3 months ago
Dose support kv cache is fp8 or int8 , but calculate is also fp16?read kvcashe by int8 is more fast by fp16, then in shaerd memory will convert int8 to fp16 and calculate.
Not yet. PRs are welcome.
Dose support kv cache is fp8 or int8 , but calculate is also fp16?read kvcashe by int8 is more fast by fp16, then in shaerd memory will convert int8 to fp16 and calculate.