66RING / tiny-flash-attention

flash attention tutorial written in python, triton, cuda, cutlass
216 stars 17 forks source link

is the cutlass version support on sm75 #9

Open A-transformer opened 1 month ago

A-transformer commented 1 month ago

is our code test on sm76 architect ?

66RING commented 1 month ago

Have not test sm75 series since the official FA implementation is at least sm80 and requires a GPU with Ampere architecture at least.

https://github.com/Dao-AILab/flash-attention/blob/bedf8774677315c5eb7e640eca6d7aa15e87775a/setup.py#L169