ROCm / flash-attention

Fast and memory-efficient exact attention
BSD 3-Clause "New" or "Revised" License
131 stars 41 forks source link

Optimized API for packed conditions #12

Closed guangzlu closed 1 year ago

guangzlu commented 1 year ago

Optimized API for packed conditions, tensor copies are removed in qkvpacked and kvpacked conditions. kv-packed-compare.xlsx qkv-packed-compare.xlsx

Running time can reduce when using qkvpacked and kvpacked inputs.