Change the causal masking when seqlen_q > seqlen_kv

FlagOpen / FlagAttention

A collection of memory efficient attention operators implemented in the Triton language.

Other

213 stars 13 forks source link

Closed iclementine closed 9 months ago

iclementine commented 9 months ago

Change the causal masking when seqlen_q > seqlen_kv.

Before

After

In the case when seqlen_q > seqlen_kv and causal masking is applied, some qs cannot attend to any ks, so the corresponding output is zero.