Closed zhmd closed 3 years ago
Thanks for this great repo!
I'm curious that for pre-training, it seems visual_attention_mask is never passed to LXRTModel, as shown in the few lines below? https://github.com/airsplay/lxmert/blob/0db1182b9030da3ce41f17717cc628e1cd0a95d5/src/lxrt/modeling.py#L924-L927
visual_attention_mask
LXRTModel
The signature of LXRTModel is defined here:
https://github.com/airsplay/lxmert/blob/0db1182b9030da3ce41f17717cc628e1cd0a95d5/src/lxrt/modeling.py#L845-L846
If I'm understanding correctly, the visual_attention_mask should be the feat_mask from this line: https://github.com/airsplay/lxmert/blob/0db1182b9030da3ce41f17717cc628e1cd0a95d5/src/pretrain/lxmert_pretrain.py#L178 which is saved in object_labels['feat'][1]?
feat_mask
object_labels['feat'][1]
@airsplay , do you care to clarify a bit why the visual_attention_mask is not used in the LXRT encoder?
Thanks for this great repo!
I'm curious that for pre-training, it seems
visual_attention_mask
is never passed toLXRTModel
, as shown in the few lines below? https://github.com/airsplay/lxmert/blob/0db1182b9030da3ce41f17717cc628e1cd0a95d5/src/lxrt/modeling.py#L924-L927The signature of
LXRTModel
is defined here:https://github.com/airsplay/lxmert/blob/0db1182b9030da3ce41f17717cc628e1cd0a95d5/src/lxrt/modeling.py#L845-L846
If I'm understanding correctly, the
visual_attention_mask
should be thefeat_mask
from this line: https://github.com/airsplay/lxmert/blob/0db1182b9030da3ce41f17717cc628e1cd0a95d5/src/pretrain/lxmert_pretrain.py#L178 which is saved inobject_labels['feat'][1]
?@airsplay , do you care to clarify a bit why the
visual_attention_mask
is not used in the LXRT encoder?