facebookresearch / mmf

A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
https://mmf.sh/
Other
5.49k stars 935 forks source link

Not able to reproduce vilbert validation accuracy on vqa2 after feature extraction #1147

Open DevSheth opened 2 years ago

DevSheth commented 2 years ago

❓ Questions and Help

Hey I am trying to reproduce the results of vilbert on the vqa2 dataset. I first ran the command

mmf_run config=projects/vilbert/configs/vqa2/defaults.yaml \
datasets=vqa2 \
model=vilbert \
run_type=val \
checkpoint.resume_zoo=vilbert.finetuned.vqa2

for which I got the following output:

2021-10-19T07:08:56 | INFO | mmf.utils.general : Total Parameters: 250985529. Trained Parameters: 250985529
2021-10-19T07:08:56 | INFO | mmf.trainers.mmf_trainer : Starting inference on val set
2021-10-19T07:08:56 | INFO | mmf.common.test_reporter : Predicting for vqa2
2021-10-19T07:37:44 | INFO | mmf.trainers.core.evaluation_loop : Finished training. Loaded 447
2021-10-19T07:37:44 | INFO | mmf.trainers.core.evaluation_loop :  -- skipped 0 batches.
2021-10-19T07:37:46 | INFO | mmf.trainers.callbacks.logistics : val/vqa2/logit_bce: 3.6308, val/total_loss: 3.6308, val/vqa2/vqa_accuracy: 0.6905
2021-10-19T07:37:46 | INFO | mmf.trainers.callbacks.logistics : Finished run in 28m 51s 154ms

But then in order to sanity check the feature extraction I downloaded the coco val 2014 image set and then extracted features using the command - python extract_features_vmb.py --model_name=X-101 --image_dir=datasets/vqa2/sanity_check/images/val2014 --output_folder=datasets/vqa2/sanity_check/features --batch_size 16

Now again running the evaluation loop on the newly extracted features gives me different results. I just changed the configuration path and pointed to the newly generated folder datasets/vqa2/sanity_check/features for the val set features. The folder contains *.npy and *_info.npy files. The results obtained after feature extraction are:

2021-11-16T01:39:11 | INFO | mmf.utils.general : Total Parameters: 250985529. Trained Parameters: 250985529
2021-11-16T01:39:11 | INFO | mmf.trainers.mmf_trainer : Starting inference on val set
2021-11-16T01:39:12 | INFO | mmf.common.test_reporter : Predicting for vqa2
2021-11-16T02:10:21 | INFO | mmf.trainers.core.evaluation_loop : Finished training. Loaded 168
2021-11-16T02:10:21 | INFO | mmf.trainers.core.evaluation_loop :  -- skipped 0 batches.
2021-11-16T02:10:24 | INFO | mmf.trainers.callbacks.logistics : val/vqa2/logit_bce: 7.5127, val/total_loss: 7.5127, val/vqa2/vqa_accuracy: 0.3672
2021-11-16T02:10:24 | INFO | mmf.trainers.callbacks.logistics : Finished run in 31m 16s 010ms

There is a huge difference in accuracy. I have also checked the downloaded feature files from MMF and the extracted features using the scripts provided and there seems to be quite a difference between those two.

Can you point out where am I going wrong in reproducing the results? I checked the vilbert paper as well and it asks to use Faster R-CNN with ResNet-101 backbone features.