facebookresearch / mmf

A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
https://mmf.sh/
Other
5.5k stars 939 forks source link

Batch size affects feature values in extract_features_vmb.py #720

Closed vecxoz closed 3 years ago

vecxoz commented 3 years ago

❓ Questions and Help

Hi,

I get different feature values with different batch size using extract_features_vmb.py. What could be the reason? Please check this example:

!mkdir img
!curl -o img/a.jpg https://upload.wikimedia.org/wikipedia/commons/e/ed/McIntosh.jpg
!curl -o img/b.jpg https://upload.wikimedia.org/wikipedia/commons/e/ef/Summerred.jpg

!python mmf/tools/scripts/features/extract_features_vmb.py \
    --image_dir=img \
    --batch_size=1 \
    --output_folder=out_batch_1

!python mmf/tools/scripts/features/extract_features_vmb.py \
    --image_dir=img \
    --batch_size=2 \
    --output_folder=out_batch_2

import numpy as np
print(np.load('out_batch_1/a.npy')[10, :10])
print('----')
print(np.load('out_batch_2/a.npy')[10, :10])

[ 0.         0.         0.         0.         0.        13.322002
  0.         0.         0.         2.7472088]
----
[ 0.          0.          0.          0.31102878  2.5339954   6.991492
 10.115531    0.          0.          4.991534  ]
hackgoofer commented 3 years ago

Hi @vecxoz, thank you for using mmf!

So, you are right. In your example, the two feature values are different. This is because we expect the images within the same batch to have the same size. In your case, when batch_size = 1, the size of each image in the batch is that particular image's size. However, when batch_size = 2, the size of one of the image needs to be resized to match the other, therefore the feature values are different.

You can double check this behavior by passing two images of the same size into the model, the feature values should be the same.

Hope this helps! Thank you.

vecxoz commented 3 years ago

Got it, many thanks for the help @ytsheng MMF is great!

shivgodhia commented 3 years ago

@ytsheng Do you know what size we expect the images to be? Is it supposed to be uniform across the dataset? I am wondering if it would be possible to replicate the feature extraction that Facebook did to create the detectron.lmdb feature datasets. I'm specifically working with Hateful Memes.