ROCm / flash-attention

Fast and memory-efficient exact attention
BSD 3-Clause "New" or "Revised" License
141 stars 46 forks source link

Improve FMHA bwd #70

Closed rocking5566 closed 3 months ago

rocking5566 commented 4 months ago

This PR integrate the bwd optimization https://github.com/ROCm/composable_kernel/pull/1397