Closed iclementine closed 9 months ago
Change the causal masking when seqlen_q > seqlen_kv.
seqlen_q > seqlen_kv
Before
After
In the case when seqlen_q > seqlen_kv and causal masking is applied, some qs cannot attend to any ks, so the corresponding output is zero.
q
k
After this PR, the causal masking is the same as flash_attn 2.1+. (See also https://github.com/Dao-AILab/flash-attention#21-change-behavior-of-causal-flag)
Change the causal masking when
seqlen_q > seqlen_kv
.Before
After
In the case when
seqlen_q > seqlen_kv
and causal masking is applied, someq
s cannot attend to anyk
s, so the corresponding output is zero.After this PR, the causal masking is the same as flash_attn 2.1+. (See also https://github.com/Dao-AILab/flash-attention#21-change-behavior-of-causal-flag)