facebookresearch / mmf

A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
https://mmf.sh/
Other
5.45k stars 925 forks source link

What is the feature extractor used for the pretrained M4C model provided for textvqa? #1263

Open onuriel opened 1 year ago

onuriel commented 1 year ago

What is the feature extractor used for the pretrained M4C model provided for textvqa?

I noticed the script : https://github.com/facebookresearch/mmf/blob/main/tools/scripts/features/extract_features_vmb.py

I would like to recreate the mmf/data/datasets/textvqa/defaults/features/open_images/detectron.lmdb as it is currently used for training/evaluating the m4c model given. To be more specific, I want to create an additional dataset to be used by the same pretrained model provided in your repo : https://mmf.sh/docs/projects/m4c#pretrained-m4c-models. To confirm these are the exact features, it would be necessary to create the dataset mentioned.

As there are two options : python tools/scripts/features/extract_features_vmb.py --model_name=X-152 --image_dir=[some folder] --output_folder=[some_folder] or python tools/scripts/features/extract_features_vmb.py --model_name=X-101 --image_dir=[some folder] --output_folder=[some_folder]

I tried both of them on the textvqa dataset, and I can't seem to reproduce the exact features provided in mmf/data/datasets/textvqa/defaults/features/open_images/detectron.lmdb using the script https://github.com/facebookresearch/mmf/blob/main/tools/scripts/features/extract_features_vmb.py