facebookresearch / mmf

A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
https://mmf.sh/
Other
5.5k stars 939 forks source link

[Hateful Memes] Unstable validation on pretrained baseline model zoo #330

Closed jinhyun95 closed 4 years ago

jinhyun95 commented 4 years ago

❓ Questions and Help

When executing the following command to evaluate the pretrained model zoo, the results differ every time, and none of the results shows the results provided in https://arxiv.org/abs/2005.04790.

CUDA_VISIBLE_DEVICES=0 mmf_run config=projects/hateful_memes/configs/visual_bert/from_coco.yaml model=visual_bert dataset=hateful_memes run_type=val checkpoint.resume_zoo=visual_bert.finetuned.hateful_memes.from_coco

run 1:

2020-06-15T09:20:54` | INFO | mmf.train : progress: 0/22000, val/total_loss: 0.6715, val/hateful_memes/cross_entropy: 0.6715, val/hateful_memes/accuracy: 0.6100, val/hateful_memes/binary_f1: 0.5714, val/hateful_memes/roc_auc: 0.6486

run 2:

2020-06-15T09:21:47 | INFO | mmf.train : progress: 0/22000, val/total_loss: 0.8715, val/hateful_memes/cross_entropy: 0.8715, val/hateful_memes/accuracy: 0.5000, val/hateful_memes/binary_f1: 0.6649, val/hateful_memes/roc_auc: 0.4461

run 3:

2020-06-15T09:24:34 | INFO | mmf.train : progress: 0/22000, val/total_loss: 0.6607, val/hateful_memes/cross_entropy: 0.6607, val/hateful_memes/accuracy: 0.6680, val/hateful_memes/binary_f1: 0.7233, val/hateful_memes/roc_auc: 0.6720

run 4:

2020-06-15T09:25:36 | INFO | mmf.train : progress: 0/22000, val/total_loss: 0.7030, val/hateful_memes/cross_entropy: 0.7030, val/hateful_memes/accuracy: 0.5460, val/hateful_memes/binary_f1: 0.6789, val/hateful_memes/roc_auc: 0.6093

vedanuj commented 4 years ago

The config projects/hateful_memes/configs/visual_bert/from_coco.yaml is used only for training as it has checkpoint.resume_pretrained=True and loads only the bert part of the model from a coco pretrained visual bert model and the classification weights are not loaded.

For validation and inference when you are loading a finetuned model use this config : projects/hateful_memes/configs/visual_bert/defaults.yaml

So your command should be this :

mmf_run config=projects/hateful_memes/configs/visual_bert/defaults.yaml model=visual_bert dataset=hateful_memes run_type=val checkpoint.resume_zoo=visual_bert.finetuned.hateful_memes.from_coco
jinhyun95 commented 4 years ago

@vedanuj Thanks!! Same should be applied for Vilbert, I suppose?

apsdehal commented 4 years ago

@jinhyun95 Yes, you are right.

purvaten commented 4 years ago

Hi @vedanuj I ran the same command that you mentioned for validation:

mmf_run config=projects/hateful_memes/configs/visual_bert/defaults.yaml model=visual_bert dataset=hateful_memes run_type=val checkpoint.resume_zoo=visual_bert.finetuned.hateful_memes.from_coco

However I am not able to exactly reproduce the numbers from the paper.

For the above command for visual bert finteuned on coco I get roc_auc=0.7342 and accuracy=0.6320, which are slightly lower than those reported in the paper.

For vilbert finetuned with cc I get roc_auc=0.7067 and accuracy=0.6280, which are slightly higher than those reported in the paper.

Am I missing something?

vedanuj commented 4 years ago

Hi @purvaten .. the numbers reported in the paper are average of multiple runs with different seeds. So there might be slight differences.

purvaten commented 4 years ago

@vedanuj I see, makes sense. Thanks for the clarification!