ROCm / flash-attention

Fast and memory-efficient exact attention
BSD 3-Clause "New" or "Revised" License
131 stars 41 forks source link

Feature request: Sliding Window Attention #22

Open tjtanaa opened 10 months ago

tjtanaa commented 10 months ago

It would be wonderful if there is a support for this feature which is equivalent to flash attention v2.3. This would also support Mistral-7B model as well, which is one of the best opensource 7B model architecture.

May I know is there a plan to bump flash attention v2.0.4 ROCm to v2.3?

jayz0123 commented 10 months ago

Hi @tjtanaa, I am not sure when the sliding window feature will be implemented in this repo because that will depend on the feature to be implemented in the CK backend. There is a new CK for attention with better performance under development so this flash-attention for ROCm will then be refactored based on that one. But until then it will stays at v2.0.4. You may also want to check with the Flash-Attention in PyTorch for ROCm to see if their implementation are going to support that feature soon.

jamestwhedbee commented 7 months ago

Hey just wanted to check in if there were any updates on this?

fe1ixxu commented 7 months ago

+1

ehartford commented 5 months ago

Please, this is impacting customers.

The current flash attention version does not support sliding window attention, for a more memory efficient implementation make sure to upgrade flash-attn library.

linchen111 commented 5 months ago

+1 , Please~

Bellk17 commented 5 months ago

+1