airsplay / lxmert

PyTorch code for EMNLP 2019 paper "LXMERT: Learning Cross-Modality Encoder Representations from Transformers".
MIT License
923 stars 157 forks source link

Can I use -load_lxmert instead of -load_lxmert_qa when fine-tuning the VQA model? #72

Open yikuan8 opened 4 years ago

yikuan8 commented 4 years ago

Thank you for the great repo.

I am trying to fine-tune the pre-trained lxmert weights on my own VQA dataset.

Can I use -load_lxmert instead of -load_lxmert_qa when fine-tuning the VQA model? Actually, I am also not clear with what the QA head is.

Thank you


I got such logs when using -loax_lxmert

eights in loaded but not in model: answer_head.logit_fc.0.bias answer_head.logit_fc.0.weight answer_head.logit_fc.2.bias answer_head.logit_fc.2.weight answer_head.logit_fc.3.bias answer_head.logit_fc.3.weight cls.predictions.bias cls.predictions.decoder.weight cls.predictions.transform.LayerNorm.bias cls.predictions.transform.LayerNorm.weight cls.predictions.transform.dense.bias cls.predictions.transform.dense.weight cls.seq_relationship.bias cls.seq_relationship.weight obj_predict_head.decoder_dict.attr.bias obj_predict_head.decoder_dict.attr.weight obj_predict_head.decoder_dict.feat.bias obj_predict_head.decoder_dict.feat.weight obj_predict_head.decoder_dict.obj.bias obj_predict_head.decoder_dict.obj.weight obj_predict_head.transform.LayerNorm.bias obj_predict_head.transform.LayerNorm.weight obj_predict_head.transform.dense.bias obj_predict_head.transform.dense.weight

Weights in model but not in loaded:

airsplay commented 4 years ago

It is possible. Loading with the option -load_lxmert will not load the pre-trained classifier (which is named as QA head following the name convention in detection systems and BERT models) for the QA tasks but would reach very similar results (with longer training time).