Closed shivgodhia closed 3 years ago
@vedanuj Sorry to bother you, I noticed this repository is yours: https://gitlab.com/vedanuj/vqa-maskrcnn-benchmark
I have tried extracting features using extract_features_vmb.py instead. I then used lmdb_conversion to extract the detectron.lmdb features that are automatically downloaded from fb servers (they are stored in /home/sgg29/.cache/torch/mmf/data/datasets/hateful_memes/defaults/features/detectron.lmdb
). Call this the ground truth features.
Then I compared the features extracted using extract_features_vmb.py by loading them using numpy, against those that were extracted (above paragraph).
The shape is finally (100, 2048) for both, so I'm on the right track (Brett's feature extractor extracted features with shape (36, 2048). But the numpy arrays loaded are not the same for the same image, so the features are still different (not sure how different, but they're different).
Can you tell me how exactly detectron.lmdb was created?
Referring to the hateful memes paper:
We evaluate two image encoders: 1) standard ResNet-152 [30] convolutional features from res-5c with average pooling (Image-Grid) 2) features from fc6 layer of Faster-RCNN [60] with ResNeXt- 152 as its backbone [86]. The Faster-RCNN is trained on Visual Genome [43] with attribute loss following [69] and features from fc6 layer are fine-tuned using weights of the fc7 layer (Image- Region). For the textual modality, the unimodal model is BERT [14] (Text BERT).
I don't get the last bit: "and features from fc6 layer are fine-tuned using weights of the fc7 layer". I think this is what I'm missing. How do I do that?
🐛 Bug
extract_features_frcnn.py might not be extracting the same features as what was used for the original dataset
To Reproduce
It's tricky to do so, but basically I created a Predictor class that loads the model, and takes in an image (path to the png file) and text and does the required transforms on the data before building a sample list and running it through the model.
Using this, I ran the model on all the images in the validation set and computed the statistics. I also did the same using mmf_run to see what the official implementation of the model would get.
This worked perfectly (totally identical acc and roc_auc scores down to the fourth decimal place) for Image-Grid, Text BERT, Concat BERT and Late Fusion. It did not worl for Visual BERT and when I tried it for Image-Region (which uses features) it also did not work.
Thus I conclude that there is a problem somewhere related to feature extraction. It could be that I'm not constructing the sample list with the features correctly, or that the features themselves are very different and not usable in the model
code
The is for Image-Region
Similar concept for Visual BERT
Expected behavior
Same or at least very similar acc and auroc scores on the validation set are expected
Image-Region ground truth: val/hateful_memes/accuracy: 0.5759, val/hateful_memes/binary_f1: 0.1358, val/hateful_memes/roc_auc: 0.4790 Image-Region what I got: val/hateful_memes/accuracy: 0.4860, val/hateful_memes/binary_f1: 0.5499, val/hateful_memes/roc_auc: 0.5198
Note the wildly varying scores for Visual BERT COCO
Visual BERT COCO ground truth: val/hateful_memes/accuracy: 0.6840, val/hateful_memes/binary_f1: 0.6010, val/hateful_memes/roc_auc: 0.7559 Visual BERT COCO what I got: val/hateful_memes/accuracy: 0.5540, val/hateful_memes/binary_f1: 0.3989, val/hateful_memes/roc_auc: 0.6127
Environment
You can run the script with:
Collecting environment information... PyTorch version: 1.8.1 Is debug build: False CUDA used to build PyTorch: None ROCM used to build PyTorch: N/A
OS: macOS 11.1 (x86_64) GCC version: Could not collect Clang version: 12.0.0 (clang-1200.0.32.28) CMake version: version 3.19.2
Python version: 3.8 (64-bit runtime) Is CUDA available: False CUDA runtime version: No CUDA GPU models and configuration: No CUDA Nvidia driver version: No CUDA cuDNN version: No CUDA HIP runtime version: N/A MIOpen runtime version: N/A
Versions of relevant libraries: [pip3] numpy==1.20.2 [pip3] pytorch-lightning==1.2.7 [pip3] torch==1.8.1 [pip3] torchmetrics==0.3.0 [pip3] torchtext==0.5.0 [pip3] torchvision==0.9.1 [conda] Could not collect