Issue about the weights of cross attention in x_layers

airsplay / lxmert

PyTorch code for EMNLP 2019 paper "LXMERT: Learning Cross-Modality Encoder Representations from Transformers".

MIT License

933 stars 158 forks source link

Issue about the weights of cross attention in x_layers #69

Open XChuanLee opened 4 years ago

XChuanLee commented 4 years ago

Sorry for this bothing.

Did the visn-lang-attention shares the same weights with the lang-visn-attention in the cross layers? It confuses me if the performance could be better when they uses different weights. Did you try it?

airsplay commented 4 years ago

Yes. I tried it. The results of not sharing / sharing are almost the same. Sharing is slightly better (~0.5%) in downstream tasks. I thus share them to save parameters.

XChuanLee commented 4 years ago

Thanks very much!