Closed shivgodhia closed 3 years ago
Hello @hivestrung, thanks for your help on the feature extraction, I adapted the interfaces and used pretrained visual_bert model to predict on raw images and text, please see https://github.com/junqi-jiang/mmf/commit/9386ea117ec7aaa038aa0ca35c8fd8a3af7fbbbf. The sample-creating bits are at image_features.py::69-92.
In the interfaces/image_features.py, the feature_list and info_list directly extracted by extract_features_frcnn.py are loaded and re-structured as sample lists and directly feed them into the models. It just worked. The codes are naive for now, haven't integrated the feature extraction to the interface.
Also, I am not yet sure whether the image should be passed into the image_processor before extracting features, and whether the extracted bounding boxes need to be passed into the bbox_processor.
@junqi-jiang Man, thanks so much - your code helped me figure out what was wrong with mine. I can report that I've gotten on-the-fly predictions done without any file loading or saving :)
One thing to note for integrating everything nicely is that when the feature is saved (specifically, feat_list) what is saved is feat_list.cpu().numpy()
, not feat_list, which I didn't notice before. Secondly, when loading, your code required me to apply torch.from_numpy to that. So I've gone and applied both of those and it works
I'm not sure if im_info is even needed though. But it's good that your code includes it, it just seems like it's not needed for inference?
Anyway here's my code for your information, I used yours for loading and integrated it with my modified extract_features script (unchanged from my fork)
elif self.model_info.use_features:
features, full_features, im_feature_0, im_info_0 = self.feature_extractor.extract_features(
image_dir=img, save_single=False)
sample_im_info = Sample()
sample_im_info.bbox = im_info_0['bbox']
sample_im_info.num_boxes = im_info_0['num_boxes']
sample_im_info.image_width = im_info_0['image_width']
sample_im_info.image_height = im_info_0['image_height']
sample_list_info = SampleList([sample_im_info])
sample.image_feature_0 = torch.from_numpy(im_feature_0.cpu().numpy())
sample.image_info_0 = sample_list_info
Hi, I have been working on inference using a number of the baseline models. I've gotten it to work well on image+text but for VisualBert, because use_features is enabled, I need the features.
So I have been working on the extract_features_frcnn.py script and have modified a bunch of it to allow it to work on the fly to generate the features. I can confirm that all this works. The code is available at my fork here: https://github.com/hivestrung/mmf/commit/6332a9803721a9a230913b6e2589fed172ae4778. Please feel free to adapt it and use it to augment the main repo, but do credit me if you do use it.
Prediction code
Anyway, I have written a script for predicting: https://www.dropbox.com/s/0rlwqcod3mzdorn/predict.py?dl=0 This works on Image-Grid, Text BERt, Late Fusion and Concat BERT. It may well work on some other models too. I am working exclusively on the Hateful Memes dataset.
Here is a rough idea of how I am using the feature extractor. What should I set self.image_feature_0 to?
Errors
I tried to run the code above and this is what I got. I don't know how to use the features for inference.
Who can help
@brettallenyo as they wrote the feature extraction script, and @vedanuj who seems to know a lot about this stuff