Attn Mask for Non-causal Models

OpenNLPLab / cosFormer

[ICLR 2022] Official implementation of cosformer-attention in cosFormer: Rethinking Softmax in Attention

Apache License 2.0

178 stars 26 forks source link

Attn Mask for Non-causal Models #5

Open roshansh-cmu opened 2 years ago

roshansh-cmu commented 2 years ago

We are examining non-NLP applications of the cosformer self-attention, and would need to use attention masking for the padded tokens in the batch. Is there a way to incorporate this ? Because the code does not explicitly compute the attention weights on which masking is traditionally applied.

Doraemonzzz commented 2 years ago

We are examining non-NLP applications of the cosformer self-attention, and would need to use attention masking for the padded tokens in the batch. Is there a way to incorporate this ? Because the code does not explicitly compute the attention weights on which masking is traditionally applied.

Can you provide some examples before/after masking?

npzl commented 1 year ago

for example, Swin-transformer mask ![Uploading image.png…]()