Dao-AILab / flash-attention

Fast and memory-efficient exact attention
BSD 3-Clause "New" or "Revised" License
13.65k stars 1.25k forks source link

Dose support kv cache is fp8 or int8 , but calculate is also fp16 #1008

Open KnightYao opened 3 months ago

KnightYao commented 3 months ago

Dose support kv cache is fp8 or int8 , but calculate is also fp16?read kvcashe by int8 is more fast by fp16, then in shaerd memory will convert int8 to fp16 and calculate.

tridao commented 3 months ago

Not yet. PRs are welcome.