[Visual BERT | Hateful Memes] Inference: Got extract_features_frcnn.py to work, can't create the sample

shivgodhia commented 3 years ago

Hi, I have been working on inference using a number of the baseline models. I've gotten it to work well on image+text but for VisualBert, because use_features is enabled, I need the features.

So I have been working on the extract_features_frcnn.py script and have modified a bunch of it to allow it to work on the fly to generate the features. I can confirm that all this works. The code is available at my fork here: https://github.com/hivestrung/mmf/commit/6332a9803721a9a230913b6e2589fed172ae4778. Please feel free to adapt it and use it to augment the main repo, but do credit me if you do use it.

Prediction code

Anyway, I have written a script for predicting: https://www.dropbox.com/s/0rlwqcod3mzdorn/predict.py?dl=0 This works on Image-Grid, Text BERt, Late Fusion and Concat BERT. It may well work on some other models too. I am working exclusively on the Hateful Memes dataset.

Here is a rough idea of how I am using the feature extractor. What should I set self.image_feature_0 to?

# set up
self.feature_extractor = FeatureExtractor()

def predict(self, image_path, text):
        # process text, build a sample to begin with
        if self.model_info.use_text:
            text = {"text": text}
            text_output = self.text_processor[self.TEXT_PROCESSOR_KEY](text)
            sample = Sample(text_output)
            sample.text = text_output
        else:
            sample = Sample()
        # process image
        img = Image.open(image_path)
        if self.model_info.use_images:
            img = img.convert("RGB")
            img = self.image_processor[self.IMAGE_PROCESSOR_KEY](img)
            sample.image = img
        elif self.model_info.use_features:
            # TODO: fix features
            # NOTE: I have modified extract_features to be able to take in a PIL Image, and not to save the features but rather to return them all
            features, full_features, feat_list, info_list = self.feature_extractor.extract_features(image_dir=img, save_single=False)
            # What am I supposed to do here? 
            sample.image_feature_0 = features
        else:
            raise RuntimeError("Model {} has both use_images and use_features set to False".format(self.model_info.name))
        sample_list = SampleList([sample])
        print(sample_list)
        output = self.model(sample_list)

Errors

I tried to run the code above and this is what I got. I don't know how to use the features for inference.

AttributeError: Key new_full not found in the SampleList. Valid choices are ['obj_ids', 'obj_probs', 'attr_ids', 'attr_probs', 'boxes', 'sizes', 'preds_per_image', 'roi_features', 'normalized_boxes']
Traceback:
File "/Users/shiv/Library/Mobile Documents/com~apple~CloudDocs/Dissertation/hateful-memes/env/lib/python3.8/site-packages/streamlit/script_runner.py", line 333, in _run_script
    exec(code, module.__dict__)
File "/Users/shiv/Library/Mobile Documents/com~apple~CloudDocs/Dissertation/hateful-memes/website/models/app.py", line 48, in <module>
    predictions = predictor.predict(image_path=input_image, text=input_text)
File "/Users/shiv/Library/Mobile Documents/com~apple~CloudDocs/Dissertation/hateful-memes/website/models/predict.py", line 130, in predict
    report = Report(sample_list, output)
File "/Users/shiv/Library/Mobile Documents/com~apple~CloudDocs/Dissertation/hateful-memes/env/lib/python3.8/site-packages/mmf/models/base_model.py", line 236, in __call__
    model_output = super().__call__(sample_list, *args, **kwargs)
File "/Users/shiv/Library/Mobile Documents/com~apple~CloudDocs/Dissertation/hateful-memes/env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
File "/Users/shiv/Library/Mobile Documents/com~apple~CloudDocs/Dissertation/hateful-memes/env/lib/python3.8/site-packages/mmf/models/visual_bert.py", line 558, in forward
    sample_list = self.update_sample_list_based_on_head(sample_list)
File "/Users/shiv/Library/Mobile Documents/com~apple~CloudDocs/Dissertation/hateful-memes/env/lib/python3.8/site-packages/mmf/models/visual_bert.py", line 511, in update_sample_list_based_on_head
    image_dim_variable = sample_list["image_feature_0"].new_full(
File "/Users/shiv/Library/Mobile Documents/com~apple~CloudDocs/Dissertation/hateful-memes/env/lib/python3.8/site-packages/mmf/common/sample.py", line 164, in __getattr__
    raise AttributeError(

Who can help

@brettallenyo as they wrote the feature extraction script, and @vedanuj who seems to know a lot about this stuff

junqi-jiang commented 3 years ago

Hello @hivestrung, thanks for your help on the feature extraction, I adapted the interfaces and used pretrained visual_bert model to predict on raw images and text, please see https://github.com/junqi-jiang/mmf/commit/9386ea117ec7aaa038aa0ca35c8fd8a3af7fbbbf. The sample-creating bits are at image_features.py::69-92.

In the interfaces/image_features.py, the feature_list and info_list directly extracted by extract_features_frcnn.py are loaded and re-structured as sample lists and directly feed them into the models. It just worked. The codes are naive for now, haven't integrated the feature extraction to the interface.

Also, I am not yet sure whether the image should be passed into the image_processor before extracting features, and whether the extracted bounding boxes need to be passed into the bbox_processor.

shivgodhia commented 3 years ago

@junqi-jiang Man, thanks so much - your code helped me figure out what was wrong with mine. I can report that I've gotten on-the-fly predictions done without any file loading or saving :)

One thing to note for integrating everything nicely is that when the feature is saved (specifically, feat_list) what is saved is feat_list.cpu().numpy(), not feat_list, which I didn't notice before. Secondly, when loading, your code required me to apply torch.from_numpy to that. So I've gone and applied both of those and it works

I'm not sure if im_info is even needed though. But it's good that your code includes it, it just seems like it's not needed for inference?

Anyway here's my code for your information, I used yours for loading and integrated it with my modified extract_features script (unchanged from my fork)

elif self.model_info.use_features:
            features, full_features, im_feature_0, im_info_0 = self.feature_extractor.extract_features(
                image_dir=img, save_single=False)
            sample_im_info = Sample()
            sample_im_info.bbox = im_info_0['bbox']
            sample_im_info.num_boxes = im_info_0['num_boxes']
            sample_im_info.image_width = im_info_0['image_width']
            sample_im_info.image_height = im_info_0['image_height']
            sample_list_info = SampleList([sample_im_info])

            sample.image_feature_0 = torch.from_numpy(im_feature_0.cpu().numpy())
            sample.image_info_0 = sample_list_info

facebookresearch / mmf