Closed SeekPoint closed 7 years ago
would you like share your hardware system configuration like memory/GPU, and how many time you take on training, thanks.
My server is made of 4 GTX1070 (8GB each) + 64 GB RAM + 2 Intel Xeon E5-2620V4 (2.1 GHz) + 1TB SSD PCIe + 1TB SSD SATA.
However, to be able to train one model with Attention with a tiny data loading time, you will need one GPU pytorch compatible, 3 threads and 1 * SSD SATA of 500GB devoted to storing data and only data (WARNING: not OS).
With a Pascal GPU on VQA 1.0 (VQA 2.0 will be added soon - it has twice the number of question/answer but same images):
I guess the training time excludes the process of generating the features of Resnet-152
Generating the train features takes 30 mn.
By the way the features used in our paper are available https://github.com/Cadene/vqa.pytorch#features
I under the impression that VQA task always take weeks training. OK, I will try it later.
Hey, why is the SSD required to only store data? Thanks for providing your code!
@ahmedmagdiosman I tried to load data from the SSD I use as boot drive, and got high data loading time when training models with Attention (for instance the OS may sometimes write on it or does some blocking stuff). In fact, you need to load data of dim: (batch_size x 2048 x 14 x 14) which is really big. However, the models without Attention (NoAtt) necessitate to load data of dim: (batch_size x 2048). So it's ok :)
An SSD is not required to run the models with Attention, but if you have high data loading time, be sure to use monitoring tools such as atop
or htop
to locate the bottleneck.
Lastly I suspect that h5py/HDF5 is not well suited for this kind of read intensive tasks. In fact, it seemed to work better in my old Torch7 code with torchnet.IndexedDataset. If I had the time, I would compare h5py/HDF5 and LMDB
@Cadene Thanks a lot!
I actually had some really slow loading time with the Torch7 code from the MLB paper, I suspect it's also the HDF5 format since I didn't have a problem loading *.npy files in pycaffe in the original MCB code.
In my experience, the best is to use pretrained caffe model, eg use MCB code and store the tensors as compressed numpy arrays. In this case it only takes 19gb for the whole train set so you can cache it in RAM.
For torch example see github.com/ilija139/vqa-soft
The VQA dataset is the most widely used dataset by the visual question answering community, as it has the largest volume of human-annotated open-ended answers. Other datasets such as DAQAR, COCO-QA or Visual 7W are limited in terms of size and annotation quality. These limitations make them less relevant for evaluation of multimodal fusion models than the VQA dataset. And we do not provide an implementation for those datasets (feel free to contact us, if you need those datasets).
VQA 1.0 is made of several splits: train, val, test-std (including test-dev) The biggest models are trained on the train + val splits as training set and the test-dev split is used for validation (on the evaluation server). Thus for study purpose, the smallest datasets provided in this repo are the train split as trainset and val split as valset. You can train/eval a model on this split using the
trainsplit: train
option. See mutan_noatt_train.yaml