ROCm / triton

Development repository for the Triton language and compiler
MIT License
80 stars 23 forks source link

fa decode example fp16/int4kv #492

Closed scxiao closed 4 months ago

scxiao commented 4 months ago

Hi @vgokhale, @xiaohuguo2023, @zhanglx13, when you get a chance, could you please review this script, so we can get it merge soon? Thanks

vgokhale commented 4 months ago

Do you mind renaming the file to something like flash-attention-decoding.py or flash-attention-splitk.py?

scxiao commented 4 months ago

Hi @vgokhale, when you get a chance, could you please take a look to see if you have other comments? So we can get this merged ASAP. Thanks.

scxiao commented 4 months ago

HI @vgokhale, could you please help approve this, so we can get it merged? Thanks.