FlagOpen / FlagAttention

A collection of memory efficient attention operators implemented in the Triton language.
Other
213 stars 13 forks source link

Change the causal masking when seqlen_q > seqlen_kv #12

Closed iclementine closed 9 months ago

iclementine commented 9 months ago

Change the causal masking when seqlen_q > seqlen_kv.

Before 图片

After 图片

In the case when seqlen_q > seqlen_kv and causal masking is applied, some qs cannot attend to any ks, so the corresponding output is zero.

After this PR, the causal masking is the same as flash_attn 2.1+. (See also https://github.com/Dao-AILab/flash-attention#21-change-behavior-of-causal-flag)