ROCm / flash-attention

Fast and memory-efficient exact attention
BSD 3-Clause "New" or "Revised" License
141 stars 46 forks source link

Enable sequence_parallel in bwd #89

Closed micmelesse closed 3 weeks ago

micmelesse commented 3 weeks ago

Enable sequence_parallel which gives improves bwd perf.