MILVLG / mcan-vqa

Deep Modular Co-Attention Networks for Visual Question Answering
Apache License 2.0
438 stars 88 forks source link

Features' file loading in the code #36

Closed ifmaq1 closed 3 years ago

ifmaq1 commented 3 years ago

Hi, I am confused regarding how you tackled the "bbox" information in the code. I can see loading the image features x from ".npz" file only.

Also, it is mentioned that we can work with grid features as well. Grid features' file with ".pth" extension only contains features/weights with tensor size [1, 2048, 19, 29](a sample feature file) and not any bounding box information, object detection etc. Then how can we cater those features without any such information.

MIL-VLG commented 3 years ago

The bbox information is an optional feature in this repo, our models do not use this by default.