airsplay / lxmert

PyTorch code for EMNLP 2019 paper "LXMERT: Learning Cross-Modality Encoder Representations from Transformers".
MIT License
923 stars 157 forks source link

object/attr prediction not masked #90

Closed zhmd closed 3 years ago

zhmd commented 3 years ago

Thanks a lot for sharing your code, it really helped!

A quick question regarding the visual objectives (predicting objects/attributes for Faster RCNN regions). If I'm understanding the loss calculation correctly, the feats_mask only applies to feature regression, not object/attribute prediction, so even if a patch is randomly zeroed-out or being replaced by another patch, it still needs to predict the original labels for objects/attributes.

Is that correct? Am I missed something here?

Many thanks!

zhmd commented 3 years ago

Sorry just noticed that it's duplicate with #41 . Closed.