Dao-AILab / flash-attention

Fast and memory-efficient exact attention
BSD 3-Clause "New" or "Revised" License
14.42k stars 1.35k forks source link

Turing architecture error on Nvidia Quadro T1000 #1230

Open Tortoise17 opened 2 months ago

Tortoise17 commented 2 months ago

I am facing this error

RuntimeError: FlashAttention only supports Ampere GPUs or newer.

while architecture is Turing. Is there any tip to resolve it? GPU is NVIDIA T1000.

kindly help.

Carnyzzle commented 2 months ago

Flash attention 1.x supports Turing, Flash attention 2.x doesn't support Turing as of right now.

Tortoise17 commented 2 months ago

@Carnyzzle Thank you. I downgraded to flash_attn 1.xx and still same error. If you can specifically mention which version can get it resolved would be a great help