Is there a difference in computation speed between FlashAttention-3 and FlashAttention-2 when executed on an A100 GPU?"

Dao-AILab / flash-attention

Fast and memory-efficient exact attention

BSD 3-Clause "New" or "Revised" License

13.78k stars 1.27k forks source link

Open HongCodingGit opened 2 months ago

HongCodingGit commented 2 months ago

Is there a difference in computation speed between FlashAttention-3 and FlashAttention-2 when executed on an A100 GPU?

tridao commented 2 months ago

No, FA2 is already close to optimal on A100

HongCodingGit commented 2 months ago

Thank you for your comment. So, does this mean that performing FlashAttention-3 on an A100 results in little to no performance improvement?

tridao commented 2 months ago

A100 runs the same code as before.