ROCm / flash-attention

Fast and memory-efficient exact attention
BSD 3-Clause "New" or "Revised" License
107 stars 33 forks source link

[Feature]: Support for newer flash-attention versions (e.g. ≥2.1.0) #53

Open JiahuaZhao opened 1 month ago

JiahuaZhao commented 1 month ago

Suggestion Description

When we need to do long context inference (using LongLoRA), sometimes errors occur: need flash-attn version ≥2.1.0. So wondering if a newer version will follow.

Operating System

SUSE

GPU

MI250X

ROCm Component

ROCm 5.4.3

jinsong-mao commented 2 weeks ago

+1, the default branch "flash_attention_for_rocm" has 272 commits behind TriDao's repo, a lot of API are not compatible with some frameworks, anyway to resolve this? any new branchs?

turboderp commented 1 week ago

I don't know what another +1 is worth, but catching up with specifically lower-right causal masking and paged attention would make a world of difference for ROCm users.