Why having a self-attn after cross_attn

airsplay / lxmert

PyTorch code for EMNLP 2019 paper "LXMERT: Learning Cross-Modality Encoder Representations from Transformers".

MIT License

923 stars 157 forks source link

Why having a self-attn after cross_attn #102

Open LeungWaiHo opened 3 years ago

LeungWaiHo commented 3 years ago

Firstly thanks for your excellent work Now I have an question about why do you create a self-attn module after each cross-attn module? What do they do for? Look forward to your reply~ Thanks again~

airsplay commented 3 years ago

An additional Self-att module would help fuse information inside the same modality.

For the order of modules, the answer is simple: we compare the self-->cross and cross-->self order, and empirically find that cross-->self is more stable (the results are almost the same).