Closed greeneggsandyaml closed 1 year ago
Supporting predict
for other models like VilBERT/VisualBERT is not on our immediate roadmap. However we encourage you to submit a PR for this and we can help.
Thanks for the reply! This is something I can work on in the next few weeks. To clarify, how exactly should I extract the features for VilBERT/VisualBERT? It's not clear to me what exact pretrained network was used and how the features were extracted. Thanks!
For starters, how about creating a colab demo for these models. Here are some pointers:
build_processors
method can build the relevant processors for you that you would require for processing the text. Have a look at MMBT's HM Inference example and you will understand.Let us know if something isn't clear or if you need more help. :) Looking forward to your contribution.
@apsdehal @vedanuj I was going through the code at https://github.com/facebookresearch/mmf/blob/master/tools/scripts/features/extract_features_vmb.py and I wanted to understand why is this not a part of the mmf framework and present separately as tools? Is there a reason behind this?
@GunjanChhablani The code to extract features has a dependency on maskrcnn benchmark which requires a specific setup and which we don't want to include in our main dependencies yet. So, that's why it is kep separately in tools.
Hi @apsdehal, Thank you so much for replying.
Hi @vedanuj, @apsdehal can I work on this?
im biginer can you help plz
Hello MMF authors,
Thank you for your nice repo. I'm new to the repo and I'm literally looking for the simplest thing: I'd like to run masked language modeling inference with one of your pretrained masked captioning models on a new image. This should be super super simple, but I'm not seeing how to do it.
For example, I'm looking to make a simple helper function that will take a PIL image and a caption string like "A train leaving a [MASK]", and get the result of VilBERT/VisualBERT. To do this, I think I need to extract features from the image in the same way that you extracted them for COCO/CC pretaining. Do you provide the feature extraction code (I probably just missed it)? Once I've extracted the features, how exactly should I preprocess the text data and input all the data to the model?
Note: I see that the MMBT model has a helpful
predict
interface, but the other models do not (I think?)Thank you for all your help and your great work!