facebookresearch / grid-feats-vqa

Grid features pre-training code for visual question answering
https://arxiv.org/abs/2001.03615
Apache License 2.0
268 stars 46 forks source link

code on VQA and Caption #9

Closed he-y closed 4 years ago

he-y commented 4 years ago

Thanks for the great work! Will you release the downstream VQA and Caption code? If so, when? Thank you.

endernewton commented 4 years ago

For VQA, the code will be released under mmf (see a beta version at https://github.com/facebookresearch/mmf/tree/vqa_winner). For captioning, I think we directly used captioning repo from BUTD? We are providing the features already, it should not be too hard to put up a pipeline for captioning.

playerkk commented 4 years ago

In the paper, we used Pythia (which is now in MMF) and MCAN (https://github.com/MILVLG/mcan-vqa) for VQA. For captioning, we used Pythia. It is straightforward to replace the region features with our grid version for such models.

End-to-end trained VQA will be provided in MMF.