Dao-AILab / flash-attention

Fast and memory-efficient exact attention
BSD 3-Clause "New" or "Revised" License
13.84k stars 1.28k forks source link

Turing architecture error on Nvidia Quadro T1000 #1230

Open Tortoise17 opened 1 month ago

Tortoise17 commented 1 month ago

I am facing this error

RuntimeError: FlashAttention only supports Ampere GPUs or newer.

while architecture is Turing. Is there any tip to resolve it? GPU is NVIDIA T1000.

kindly help.

Carnyzzle commented 1 month ago

Flash attention 1.x supports Turing, Flash attention 2.x doesn't support Turing as of right now.

Tortoise17 commented 1 month ago

@Carnyzzle Thank you. I downgraded to flash_attn 1.xx and still same error. If you can specifically mention which version can get it resolved would be a great help