facebookresearch / mmf

A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
https://mmf.sh/
Other
5.48k stars 935 forks source link

How to extract features for Hateful Memes? #896

Closed shivgodhia closed 3 years ago

shivgodhia commented 3 years ago

@vedanuj Sorry to bother you, I noticed this repository is yours: https://gitlab.com/vedanuj/vqa-maskrcnn-benchmark

background

I have tried extracting features using extract_features_vmb.py. I then used lmdb_conversion to extract the detectron.lmdb features that are automatically downloaded from fb servers (they are stored in /home/sgg29/.cache/torch/mmf/data/datasets/hateful_memes/defaults/features/detectron.lmdb). Call this the ground truth features.

result

Then I compared the features extracted using extract_features_vmb.py by loading them using numpy, against those that were extracted (above paragraph).

The shape is finally (100, 2048) for both. But the numpy arrays loaded are not the same for the same image, so the features are still different (not sure how different, but they're different). The values in the arrays are different.

Where I need help

Can you tell me how exactly detectron.lmdb was created?

Referring to the hateful memes paper:

We evaluate two image encoders: 1) standard ResNet-152 [30] convolutional features from res-5c with average pooling (Image-Grid) 2) features from fc6 layer of Faster-RCNN [60] with ResNeXt- 152 as its backbone [86]. The Faster-RCNN is trained on Visual Genome [43] with attribute loss following [69] and features from fc6 layer are fine-tuned using weights of the fc7 layer (Image- Region). For the textual modality, the unimodal model is BERT [14] (Text BERT).

I just don't get where the model is? It doesn't seem to be either of this:

MODEL_URL = { "X-101": "https://dl.fbaipublicfiles.com/pythia/"

apsdehal commented 3 years ago

Looking into it. It looks like a recent change by Brett has caused some issues and difference in the feature, if I revert back to original code before that change, I am able to replicate same features on batch size 2. Looking into what is causing the diff.

shivgodhia commented 3 years ago

@apsdehal this issue might be relevant: https://github.com/facebookresearch/mmf/issues/720

Also, just for your info, I've done a test on the validation set using VisualBERT COCO. You can see the results here. I fine-tuned the model, so the statistics are not the same as the original paper. In orange is what I get when I use mmf_run to generate the statistics. The commented stuff is what I get when I extract the features myself using vmb script (no rollbacks, including Brett's changes).

image

apsdehal commented 3 years ago

I haven't got a chance to look at this, but can you rollback to f5ff2c8d2f0461b2d5a9d3aeac26b78ea4079e43 and try? I was able to get same features as the original ones shared with HM dataset in MMF by rolling back to this commit.

shivgodhia commented 3 years ago

@apsdehal Sorry I haven't been able to, and probably won't for about a month as I have modified the code to add an extra argument and be callable at runtime (for model inference of new memes input by a user). I'm currently not really working on this as my dissertation is almost complete. If it's still a problem by June, I'll have a look and debug it. Thanks again for all your help these past months.

shivgodhia commented 3 years ago

@apsdehal Take a look at this. This is probably a large part of the reason why the features extracted are not the same (the sorting). Also I haven't looked at the part after the sorting, that may make a difference too. image