Dao-AILab / flash-attention

Fast and memory-efficient exact attention
BSD 3-Clause "New" or "Revised" License
13.78k stars 1.27k forks source link

Is there a difference in computation speed between FlashAttention-3 and FlashAttention-2 when executed on an A100 GPU?" #1114

Open HongCodingGit opened 2 months ago

HongCodingGit commented 2 months ago

Is there a difference in computation speed between FlashAttention-3 and FlashAttention-2 when executed on an A100 GPU?

tridao commented 2 months ago

No, FA2 is already close to optimal on A100

HongCodingGit commented 2 months ago

Thank you for your comment. So, does this mean that performing FlashAttention-3 on an A100 results in little to no performance improvement?

tridao commented 2 months ago

A100 runs the same code as before.