Open runzeer opened 4 years ago
Hmmmm... The ensemble is tricky... I think that I did not find the most correct way to do the model ensemble.
What I mainly did is to fine-tune the model from different pre-training epochs and with a few different pre-training methods as shown in the paper.
The pre-training snapshots are available here:
https://nlp1.cs.unc.edu/data/github_pretrain/lxmert20/EpochXX_LXRT.pth, XX from 01 to 20.
BTW, I do not encourage to use the model ensemble... It could not prove the effectiveness of the methods.
Could you share your pretraining methods different with the methods shown in your paper?. i just trained according to what you said but only 0.3% higher.
They are deleted for spaces. The hyperparameters to reproduce are available in Table 3, 4, 5 in the Paper.
The fusion strategies for your methods, are the average or voting for diffrent seeds?
I tried average over probs, average over logits, and majority voting. It seems that average over prob wins.
Are the pre-training snapshots available? https://nlp1.cs.unc.edu/data/github_pretrain/lxmert20/EpochXX_LXRT.pth, XX from 01 to 20.