facebookresearch / mmf

A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
https://mmf.sh/
Other
5.5k stars 939 forks source link

Hateful memes baselines don't seem to be predicting correctly #288

Closed josephch405 closed 4 years ago

josephch405 commented 4 years ago

According to the docs under the Hateful Memes directory, I should be able to run

mmf_predict config=<REPLACE_WITH_BASELINE_CONFIG> \
  model=<REPLACE_WITH_MODEL_KEY> \
  dataset=hateful_memes \
  run_type=test \
  checkpoint.resume_zoo=<REPLACE_WITH_PRETRAINED_ZOO_KEY>

And output a reasonably perfomant csv for submission. Specifically, we are running the Visual Bert baselines with

mmf_predict config=projects/hateful_memes/configs/visual_bert/defaults.yaml
  model=visual_bert \
  dataset=hateful_memes \
  run_type=test \
  checkpoint.resume_zoo=visual_bert.finetuned.hateful_memes.from_coco

, since it seems like running with the from_pretrained flag on when using from_coco.yaml was only meant for training (inferring with that config gave variable predictions).

Running the mmf_run variant of the above command on validation gives a good AUROC (0.73 ish). However, when we submit the test csv we've been getting AUROC scores on the order of ~0.3.., which seems rather odd. Is this designated behavior? Are we not using the right configs here? We've also tried training our own models from the from_coco.yaml as a starting point, but are also encountering low AUROC test scores despite high val scores. Highly suspecting that something weird is going on with the inference flow, but by inspection nothing seems to be clearly incorrect...

vedanuj commented 4 years ago

Thanks @josephch405 for raising this issue. This should be fixed now. Please try installing mmf from latest master and try it out.