ROCm / triton

Development repository for the Triton language and compiler
MIT License
80 stars 23 forks source link

support configure multiple waves in flash-attention #462

Closed scxiao closed 5 months ago

scxiao commented 6 months ago

Current implementation in the FA decode forward kernel can only configure 1 wave per workgroup, this PR is to support multiple waves per workgroup, which is expected to have better performance.