Questions about VisualBert models

facebookresearch / mmf

A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)

https://mmf.sh/

Other

5.5k stars 939 forks source link

Questions about VisualBert models #617

Closed ChenyuGAO-CS closed 1 year ago

ChenyuGAO-CS commented 4 years ago

❓ Questions and Help

Hi, I found there are several models' links in models.yaml, which one corresponds to the Task-specific Pre-training (on VQA 2.0) model of VisualBert, ie, after COCO pretrain and then VQA pretrain? Is there an accuracy report of three versions of VisualBert on VQA 2.0 under MMF?

COCO pretrain
COCO pretrain + VQA pretrain
COCO pretrain + VQA pretrain + VQA finetune

apsdehal commented 4 years ago

Hi, We didn't do task specific pretraining on VisualBERT for our paper https://arxiv.org/abs/2004.08744 as our experiments suggested it wasn't as beneficial as the time required to do it. But, you can do it on your own in MMF.

Use MaskedCOCO to pretrain on COCO first
Use MaskedVQA2 to pretrain on VQA2
Finally, use VQA2 to finetune.

apsdehal commented 4 years ago

in the model zoo, we have

COCO pretrained model
VQA2 pretrained model
VQA2 finetuned from coco pretrained
VQA2 finetuned from VQA2 pretrained