airsplay / lxmert

PyTorch code for EMNLP 2019 paper "LXMERT: Learning Cross-Modality Encoder Representations from Transformers".
MIT License
923 stars 157 forks source link

"pre-training" section in the readme #47

Open johntiger1 opened 4 years ago

johntiger1 commented 4 years ago

Just want to confirm, when you talk about "pre-training" in the readme (https://github.com/airsplay/lxmert#pre-training) you mean training the entire LXMERT model from scratch?

If we just want to use a trained LXMERT model (and stick on a classification or LSTM layer at the end), we can just use the pre-trained model link you provided: http://nlp.cs.unc.edu/data/model_LXRT.pth, load your model, freeze the weights and then finetune with our specific task, right?

Thanks

airsplay commented 4 years ago
johntiger1 commented 4 years ago
  1. Thanks! It was a little confusing since the BERT component can also be pre-trained. :)

  2. Thanks for the insight! It's good to know experimentally what worked. It is always a good question to see whether just finetuning the last layer vs finetuning the entire model works better. And there are of course trade-offs as well, with regards to training time etc.

I suspect we see this behaviour in your model in particular due to the complexity of the different moving parts.