66RING / tiny-flash-attention

flash attention tutorial written in python, triton, cuda, cutlass
194 stars 14 forks source link

Tiny FlashAttention

WIP

A tiny flash attention implement in python, rust, cuda and c for learning purpose.

cutlass cute flash attention in action

my env: cutlass v3.4, torch 1.14, cuda 12.4