Does flash_atth has a interface like xformers's BlockDiagonalCausalMask?

Dao-AILab / flash-attention

Fast and memory-efficient exact attention

BSD 3-Clause "New" or "Revised" License

14.15k stars 1.32k forks source link

Closed janelu9 closed 10 months ago

janelu9 commented 10 months ago

It flats the batch_size dim to the seq_len dim

tridao commented 10 months ago

Take a look at flash_attn_varlen_func. This is what xformers calls I think.

janelu9 commented 10 months ago

Take a look at flash_attn_varlen_func. This is what xformers calls I think.

thanks, I think so too.