Dao-AILab / flash-attention

Fast and memory-efficient exact attention
BSD 3-Clause "New" or "Revised" License
13.57k stars 1.24k forks source link

Does the new flash-attention support ROCm? #965

Open JiahuaZhao opened 4 months ago

JiahuaZhao commented 4 months ago

We refer to https://github.com/ROCm/flash-attention to install flash_attn with ROCm support (the current highest version is 2.0.4). When we need to do long context inference (using LongLoRA), sometimes errors occur: need flash-attn version ≥2.1.0. So wondering if there is a higher version of flash_attn that supports ROCm.

tridao commented 4 months ago

Sorry idk much about the ROCm version, you can ask on their repo.

rocking5566 commented 3 months ago

I just submit an PR to support AMD / ROCm on FlashAttention 2 https://github.com/Dao-AILab/flash-attention/pull/1010 This PR using composable_kernel as backend

ehartford commented 2 months ago

I believe this should be resolved @JiahuaZhao due to https://github.com/Dao-AILab/flash-attention/pull/1010