ROCm / flash-attention

Fast and memory-efficient exact attention
BSD 3-Clause "New" or "Revised" License
109 stars 33 forks source link

Is this v2 or v1? #26

Closed netw0rkf10w closed 7 months ago

netw0rkf10w commented 7 months ago

Hello, Thanks for your great work! I would like to know if the current implementation is FA v1 or v2. If it’s v1 then are you planning to upgrade to v2? Thank you in advance for your replies.

howiejayz commented 7 months ago

Hi @netw0rkf10w, this repo is synchronized to v2.0.4 of the upstream one.

netw0rkf10w commented 7 months ago

Thanks @howiejayz for your reply. Could you tell me how the current repo compares to the triton implementation in your other fork in terms of performance? I'm trying to use flash attention on MI250x cards (and also MI300A ones), and am not sure which implementation I should use. Thank you in advance!

howiejayz commented 7 months ago

Hi @netw0rkf10w, I think you should try the triton one if possbile. This version of Flash-Attention for ROCm is relatively old and the performance is not updated to the triton implementation.

netw0rkf10w commented 7 months ago

@howiejayz I see. Thanks a lot!