Open cumulo-autumn opened 2 months ago
Hi, This bias is unfortunately only supported the for inference (forward pass only cc @bottler). If you want to do something similar for training, you might be able to do that with Flex-Attention (cc @chillee)
Okay, I will take a look at Flex-Attention. Thanks!
❓ Questions and Help
Hi, I am trying to use xformers.ops.fmha.attn_bias.BlockDiagonalGappyKeysMask to create a block causal attn_bias as follows,
[[1, 1, 0, 0, 0, 0] [1, 1, 0, 0, 0, 0] [1, 1, 1, 1, 0, 0] [1, 1, 1, 1, 0, 0] [1, 1, 1, 1, 1, 1] [1, 1, 1, 1, 1, 1]]
I confirmed that this can be achieved by BlockDiagonalGappyKeysMask.from_seqlens([2,2,2],[0,0,0,6], [2,4,6]). However, I am not sure if this class also supports backward for training. Is there anyway we could know which class is supported for both inference and backward from the document?
Also, is there any alternative way that I can use to realize an attn_bias which has a shape I described above? Thank you in advance.