visual_attention_mask is set to None during pre-training?

Thanks for this great repo!

I'm curious that for pre-training, it seems visual_attention_mask is never passed to LXRTModel, as shown in the few lines below? https://github.com/airsplay/lxmert/blob/0db1182b9030da3ce41f17717cc628e1cd0a95d5/src/lxrt/modeling.py#L924-L927

The signature of LXRTModel is defined here:

If I'm understanding correctly, the visual_attention_mask should be the feat_mask from this line: https://github.com/airsplay/lxmert/blob/0db1182b9030da3ce41f17717cc628e1cd0a95d5/src/pretrain/lxmert_pretrain.py#L178 which is saved in object_labels['feat'][1]?

@airsplay , do you care to clarify a bit why the visual_attention_mask is not used in the LXRT encoder?

airsplay / lxmert