Open LeungWaiHo opened 3 years ago
An additional Self-att module would help fuse information inside the same modality.
For the order of modules, the answer is simple: we compare the self-->cross and cross-->self order, and empirically find that cross-->self is more stable (the results are almost the same).
Firstly thanks for your excellent work Now I have an question about why do you create a self-attn module after each cross-attn module? What do they do for? Look forward to your reply~ Thanks again~