Dao-AILab / flash-attention

Fast and memory-efficient exact attention
BSD 3-Clause "New" or "Revised" License
14.36k stars 1.34k forks source link

Turing GPU support #720

Open sumanthnallamotu opened 11 months ago

sumanthnallamotu commented 11 months ago

In reference to the following on the main page:

"FlashAttention-2 currently supports: Ampere, Ada, or Hopper GPUs (e.g., A100, RTX 3090, RTX 4090, H100). Support for Turing GPUs (T4, RTX 2080) is coming soon, please use FlashAttention 1.x for Turing GPUs for now. Datatype fp16 and bf16 (bf16 requires Ampere, Ada, or Hopper GPUs). All head dimensions up to 256. Head dim > 192 backward requires A100/A800 or H100/H800."

How soon can we expect support for Turing GPU's? Some models I'd like to use are based on Mistral, which requires FlashAttention v2.

Specifically, I'm looking for support with T4.

Thank you!

WingsLong commented 11 months ago

Me too!

IvoryTower800 commented 11 months ago

Me too!

online2311 commented 10 months ago

Me too!

laoda513 commented 10 months ago

me too, too LoL

mirh commented 3 months ago

Duplicate of #542

zbh2047 commented 2 weeks ago

Me too!

giopaglia commented 2 weeks ago

And me too :+1: