MILVLG / openvqa

A lightweight, scalable, and general framework for visual question answering research
Apache License 2.0
318 stars 64 forks source link

How about the result of MCAN model when use frcn_feat + bbox_feat on VQA v2.0 datasets? #69

Closed haoopan closed 3 years ago

haoopan commented 3 years ago

Hi , this is great work your team has made, I' m following your work. So how about the result of MCAN model when use frcn_feat + bbox_feat on VQA v2.0 datasets? It's better than only use frcn_feat on VQA v2.0? If not, could you tell me the reason that you think? Thank you very much:)

MIL-VLG commented 3 years ago

Directly apply bbox_feat on MCAN does not have any improvement in our experiments. The reason we are still not very sure. We guess the spatial information should be used in a more elegant way, e.g., the relational self-attention used in our new paper MMnasNet, which has also been implemented in the openvqa project.