Feature: Flash Attention 3

flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving

https://flashinfer.ai

Apache License 2.0

1.1k stars 99 forks source link

Feature: Flash Attention 3 #369

Open zhyncs opened 1 month ago

zhyncs commented 1 month ago

https://research.colfax-intl.com/flashattention-3-fast-and-accurate-attention-with-asynchrony-and-low-precision/

cc @yzh119

zhyncs commented 1 month ago

I just heard that FlashInfer has achieved a faster and more comprehensive version than Flash Attention 3, amazing! 👍 Looking forward to it!

yzh119 commented 1 month ago

I just heard that FlashInfer has achieved a faster and more comprehensive version

I don't know we have achieved that, lol. I'm indeed working on using cutlass to create an sm90 version of flashinfer, but fa3's performance is really impressive (and better than my version atm).

flashattention3 is indeed a great work that we should learn from, and yes I'll adopt its pipeline design and accelerate page/sparse attention kernels accordingly.

zhyncs commented 1 month ago

I don't know we have achieved that, lol. I'm indeed working on using cutlass to create an sm90 version of flashinfer

My expression may not be very accurate, a more accurate way to say it is "under way". 😂