Dao-AILab / flash-attention

Fast and memory-efficient exact attention
BSD 3-Clause "New" or "Revised" License
14.15k stars 1.32k forks source link

Does flash_atth has a interface like xformers's BlockDiagonalCausalMask? #765

Closed janelu9 closed 10 months ago

janelu9 commented 10 months ago

image

janelu9 commented 10 months ago

It flats the batch_size dim to the seq_len dim

tridao commented 10 months ago

Take a look at flash_attn_varlen_func. This is what xformers calls I think.

janelu9 commented 10 months ago

Take a look at flash_attn_varlen_func. This is what xformers calls I think.

thanks, I think so too.