I use FlashAttention1.0.9 and it support Turing GPU(2080Ti),but I get the errors:
RuntimeError: FlashAttention backward for head dim > 64 requires A100 or H100 GPUs as the implementation needs a large amount of shared memory.
The main constraint is the size of shared memory.
As the above mentions, Head dim > 64 backward requires A100 or H100. The forward for head dim <= 128, and backward for head dim <= 64 works on other GPUs.
I use FlashAttention1.0.9 and it support Turing GPU(2080Ti),but I get the errors: RuntimeError: FlashAttention backward for head dim > 64 requires A100 or H100 GPUs as the implementation needs a large amount of shared memory.