facebookresearch / mmf

A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
https://mmf.sh/
Other
5.5k stars 939 forks source link

Is it seen or unseen val & test sets being reported in Hateful Memes paper? #900

Closed shivgodhia closed 3 years ago

shivgodhia commented 3 years ago

Hi @apsdehal

I wanted to check why there seems to be such high variance on the test set for Text BERT. I reproduce the results here. Can I clarify which test set and val set (seen or unseen?) the results are for? Because I noticed the paper on Arxiv was updated last week, and it's unclear what really changed in the paper/the reason for the changes in the numbers reported.

I also ran inference using Text BERT and uploaded the csv to DrivenData to evaluate the model and got Acc 0.6020, AUROC 0.6552. This is for test seen.

Lastly, if the test set has changed from seen to unseen between v2 and v3 of the Arxiv paper, why is human accuracy still exactly the same at 84.70? Surely this is a weird coincidence?

image

Thanks for your time in answering my questions!

Lastly, how do I evaluate on the test sets (both seen and unseen) other than by uploading a csv to drivendata? It seems Phase 2 evaluations are closed.

douwekiela commented 3 years ago

The NeurIPS paper on arxiv covers the seen evaluation sets. The competition report, currently under review and coming out soon, will cover the unseen evaluation set.