MILVLG / openvqa

A lightweight, scalable, and general framework for visual question answering research
Apache License 2.0
320 stars 64 forks source link

Dataset setup for custom data #60

Closed ihabZhaika closed 4 years ago

ihabZhaika commented 4 years ago

Hey, In the wiki at the Dataset Setup written

We store the features for each image in a .npz file. You can prepare the visual features by yourself or download the extracted features

My question is how to train on custom data and to

prepare the visual features by yourself

  1. So is it the original image converted to numpy format ?
  2. about "each image being represented as an dynamic number (from 10 to 100) of 2048-D", is there a helper function that do this transformation ? and why 10 to 100 ? why 2048D ?
MIL-VLG commented 4 years ago

Hi, The .npz files are not the original images but extracted region-based features from a pre-trained Faster RCNN model. The feature extraction part please refer to the bottom-up-attention project, and you can use our script to extract the features (see the bottom of the README in the bottom-up-attention project). Please note that Caffe is required to run the feature extraction.

ihabZhaika commented 4 years ago

Hi, The .npz files are not the original images but extracted region-based features from a pre-trained Faster RCNN model. The feature extraction part please refer to the bottom-up-attention project, and you can use our script to extract the features (see the bottom of the README in the bottom-up-attention project). Please note that Caffe is required to run the feature extraction.

Hey, I saw it, will try and update