facebookresearch / xformers

Hackable and optimized Transformers building blocks, supporting a composable construction.
https://facebookresearch.github.io/xformers/
Other
8.63k stars 614 forks source link

When will xformers support Flash Attention 3? #1068

Open complexfilter opened 4 months ago

complexfilter commented 4 months ago

🚀 Feature

Support Flash Attention 3

Motivation

Flash Attention 3 has been proved to greatly accelerate Flash Attention 2 on H100.

Pitch

Offer Flash Attention 3 support

danthe3rd commented 4 months ago

Hi, We're following closely what's happening. The current implementation has some bugs which we reported, but when it's ready it will be integrated :) https://github.com/Dao-AILab/flash-attention/issues/1052

dill-shower commented 3 months ago

Hi, We're following closely what's happening. The current implementation has some bugs which we reported, but when it's ready it will be integrated :) Dao-AILab/flash-attention#1052

Look like it has been fixed

danthe3rd commented 3 months ago

Indeed :) We're working on this, hopefully we can have it in xFormers during next week as an experimental feature

danthe3rd commented 3 months ago

This is taking a bit more time than expected. Hopefully we will have it by next week but not sure.

ultranity commented 3 weeks ago

This is taking a bit more time than expected. Hopefully we will have it by next week but not sure.

It seems current code already included FA3 impl? Any updates about how to enable/disable it? @danthe3rd For now _USE_FLASH_ATTENTION_3=False is set by default in ops/fmha/dispatch.py