facebookresearch / xformers

Hackable and optimized Transformers building blocks, supporting a composable construction.
https://facebookresearch.github.io/xformers/
Other
8.66k stars 614 forks source link

BlockDiagonalGappyKeysMask backward support #1093

Open cumulo-autumn opened 2 months ago

cumulo-autumn commented 2 months ago

❓ Questions and Help

Hi, I am trying to use xformers.ops.fmha.attn_bias.BlockDiagonalGappyKeysMask to create a block causal attn_bias as follows,

[[1, 1, 0, 0, 0, 0] [1, 1, 0, 0, 0, 0] [1, 1, 1, 1, 0, 0] [1, 1, 1, 1, 0, 0] [1, 1, 1, 1, 1, 1] [1, 1, 1, 1, 1, 1]]

I confirmed that this can be achieved by BlockDiagonalGappyKeysMask.from_seqlens([2,2,2],[0,0,0,6], [2,4,6]). However, I am not sure if this class also supports backward for training. Is there anyway we could know which class is supported for both inference and backward from the document?

Also, is there any alternative way that I can use to realize an attn_bias which has a shape I described above? Thank you in advance.

danthe3rd commented 2 months ago

Hi, This bias is unfortunately only supported the for inference (forward pass only cc @bottler). If you want to do something similar for training, you might be able to do that with Flex-Attention (cc @chillee)

cumulo-autumn commented 2 months ago

Okay, I will take a look at Flex-Attention. Thanks!