"pre-training" section in the readme

airsplay / lxmert

PyTorch code for EMNLP 2019 paper "LXMERT: Learning Cross-Modality Encoder Representations from Transformers".

MIT License

923 stars 157 forks source link

"pre-training" section in the readme #47

Open johntiger1 opened 4 years ago

johntiger1 commented 4 years ago

Just want to confirm, when you talk about "pre-training" in the readme (https://github.com/airsplay/lxmert#pre-training) you mean training the entire LXMERT model from scratch?

If we just want to use a trained LXMERT model (and stick on a classification or LSTM layer at the end), we can just use the pre-trained model link you provided: http://nlp.cs.unc.edu/data/model_LXRT.pth, load your model, freeze the weights and then finetune with our specific task, right?

Thanks

airsplay commented 4 years ago

Yes. The script is to train the model from scratch.
Mostly correct. As I tested before, the fine-tuned model would work better if the backbone weight are not frozen. It is the same to BERT.

johntiger1 commented 4 years ago

Thanks! It was a little confusing since the BERT component can also be pre-trained. :)
Thanks for the insight! It's good to know experimentally what worked. It is always a good question to see whether just finetuning the last layer vs finetuning the entire model works better. And there are of course trade-offs as well, with regards to training time etc.

I suspect we see this behaviour in your model in particular due to the complexity of the different moving parts.