OpenNLPLab / cosFormer

[ICLR 2022] Official implementation of cosformer-attention in cosFormer: Rethinking Softmax in Attention
Apache License 2.0
176 stars 25 forks source link

Why the attn mask is not used in forward function? #6

Open HanielF opened 2 years ago

HanielF commented 2 years ago

Compared with left_product function, attention mask is not used in forward() function. How to use the attention mask in the forward method?

Doraemonzzz commented 2 years ago

When use forward() function, there is no direct way to use attention mask since we haven't compute attention matrix. If you need use attention mask, we suggest you use left_product, however, this will get loss in efficiency.