Big performance gap between my validation and the paper on Hateful Memes

honghaiwen commented 3 years ago

❓ Questions and Help

oh I find that my valiation is on dev_unseen while paper is on the dev_seen, that's the point. But I wonder why on dev_unseen dataset, the accuracy will increase? thank you a lot!

Original Question:

My training is on a device with 8 V100 32G, and set the batch size, LR according to the paper "The hateful memes challenge"(Appendix A) I even don't change anything in the mmf code, but the valid AUROC are not as good as the paper results, and the Accuracy is better than paper results.

For example, Image-Grid is only 57.16 while in paper it is 58.79(AUROC), while the accuracy is better than the paper, 60.66%(paper's is 52.73%)
Text-Bert AUROC is 57.97(64.65 in paper), Acc is 59.01%(58.26 % in paper) Late-Fusion AUROC is 63.78(65.79 in paper), Acc is 65.99%(61.53% in paper) Concat-Bert AUROC is 63.41(65.25 in paper), Acc is 63.60%(58.60% in paper) MMBT-Grid AUROC is 64.39(68.57 in paper), Acc is 65.81%(58.20% in paper)

I downloaded the source code directly from Github and installed it according to the doc, and the commands used are exactly referencing the readme file of hateful memes, with almost no changes to the source code. I don't know which step I made a mistake and would like to get an answer! Thanks a lot!

shivgodhia commented 3 years ago

Same issuesSimilar issue for me, see my results. Most of them are close though, but Text BERT is definitely off.

vedanuj commented 3 years ago

The results reported here are average of multiple runs with different seeds. So It might not be exactly same when you run with a specific seed.

facebookresearch / mmf

Big performance gap between my validation and the paper on Hateful Memes #931

❓ Questions and Help