airsplay / lxmert

PyTorch code for EMNLP 2019 paper "LXMERT: Learning Cross-Modality Encoder Representations from Transformers".
MIT License
925 stars 157 forks source link

How to generate the multi-model fusion results #57

Open runzeer opened 4 years ago

runzeer commented 4 years ago
Sorry to trouble you again!  I saw your multi-model fusion results in GQA Leaderboard.  I have trained the same model with 5 different random seeds. But the reults are just 0.2% higher than single model. So if convenient, could you share your methods?
airsplay commented 4 years ago

Hmmmm... The ensemble is tricky... I think that I did not find the most correct way to do the model ensemble.

What I mainly did is to fine-tune the model from different pre-training epochs and with a few different pre-training methods as shown in the paper.

The pre-training snapshots are available here:

https://nlp1.cs.unc.edu/data/github_pretrain/lxmert20/EpochXX_LXRT.pth, XX from 01 to 20.
airsplay commented 4 years ago

BTW, I do not encourage to use the model ensemble... It could not prove the effectiveness of the methods.

runzeer commented 4 years ago

Could you share your pretraining methods different with the methods shown in your paper?. i just trained according to what you said but only 0.3% higher.

airsplay commented 4 years ago

They are deleted for spaces. The hyperparameters to reproduce are available in Table 3, 4, 5 in the Paper.

runzeer commented 4 years ago

The fusion strategies for your methods, are the average or voting for diffrent seeds?

airsplay commented 4 years ago

I tried average over probs, average over logits, and majority voting. It seems that average over prob wins.

runzeer commented 3 years ago

Are the pre-training snapshots available? https://nlp1.cs.unc.edu/data/github_pretrain/lxmert20/EpochXX_LXRT.pth, XX from 01 to 20.