facebookresearch / mmf

A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
https://mmf.sh/
Other
5.5k stars 939 forks source link

Cannot use mmf/utils/inference.py with existing configs #852

Closed junqi-jiang closed 1 year ago

junqi-jiang commented 3 years ago

1. Cannot use mmf/utils/inference.py with existing configs

I am doing a project that needs an interface to do inference on random images using vilbert and visual_bert models for the hateful_memes challenge.

In this pull request https://github.com/facebookresearch/mmf/pull/830 an interface was developed. When I tried to instantiate an Inference object with the pretrained visual_bert checkpoint, there were no such section in the config:

self.model_items["config"].image_feature_encodings,

Could you please provide a copy of the checkpoint config file you used or give some suggestions?

[SOLVED] 2. Cannot use extract_features_frcnn.py to get image features

Also, I cannot find the lxmert config required for this file. I have manually downloaded one from huggingface but it was in json. [SOLVED: https://s3.amazonaws.com/models.huggingface.co/bert/unc-nlp/frcnn-vg-finetuned/config.yaml

]

Cheers

junqi-jiang commented 3 years ago

wrong link, it was this pull request https://github.com/facebookresearch/mmf/pull/798 that did the interface

shivgodhia commented 3 years ago

Seconded. I'm further along and have gotten extract_features_frcnn to work and return features on an arbitrary file, but even then I am not yet able to use those features for inference using visual bert. I get this error:

AttributeError: Key new_full not found in the SampleList. Valid choices are ['obj_ids', 'obj_probs', 'attr_ids', 'attr_probs', 'boxes', 'sizes', 'preds_per_image', 'roi_features', 'normalized_boxes']
Traceback:
File "/Users/shiv/Library/Mobile Documents/com~apple~CloudDocs/Dissertation/hateful-memes/env/lib/python3.8/site-packages/streamlit/script_runner.py", line 333, in _run_script
    exec(code, module.__dict__)
File "/Users/shiv/Library/Mobile Documents/com~apple~CloudDocs/Dissertation/hateful-memes/website/models/app.py", line 48, in <module>
    predictions = predictor.predict(image_path=input_image, text=input_text)
File "/Users/shiv/Library/Mobile Documents/com~apple~CloudDocs/Dissertation/hateful-memes/website/models/predict.py", line 130, in predict
    report = Report(sample_list, output)
File "/Users/shiv/Library/Mobile Documents/com~apple~CloudDocs/Dissertation/hateful-memes/env/lib/python3.8/site-packages/mmf/models/base_model.py", line 236, in __call__
    model_output = super().__call__(sample_list, *args, **kwargs)
File "/Users/shiv/Library/Mobile Documents/com~apple~CloudDocs/Dissertation/hateful-memes/env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
File "/Users/shiv/Library/Mobile Documents/com~apple~CloudDocs/Dissertation/hateful-memes/env/lib/python3.8/site-packages/mmf/models/visual_bert.py", line 558, in forward
    sample_list = self.update_sample_list_based_on_head(sample_list)
File "/Users/shiv/Library/Mobile Documents/com~apple~CloudDocs/Dissertation/hateful-memes/env/lib/python3.8/site-packages/mmf/models/visual_bert.py", line 511, in update_sample_list_based_on_head
    image_dim_variable = sample_list["image_feature_0"].new_full(
File "/Users/shiv/Library/Mobile Documents/com~apple~CloudDocs/Dissertation/hateful-memes/env/lib/python3.8/site-packages/mmf/common/sample.py", line 164, in __getattr__
    raise AttributeError(
shivgodhia commented 3 years ago

@junqi-jiang I've worked on the code and can solve some of your problems with feature extraction using mine: https://github.com/hivestrung/mmf/commit/6332a9803721a9a230913b6e2589fed172ae4778

Please see my issue https://github.com/facebookresearch/mmf/issues/853#issue-849714848 which contains more information on using the models for prediction

I still have issues with actually using the features generated by extract_features_frcnn for creating the sample's image_feature_0 field a la sample.image_feature_0.

junqi-jiang commented 3 years ago

@hivestrung Thanks! That's some great work to get the URL working. It seems I have downloaded the wrong file for the config lol.

To do inference you can also look at mmf/models/interfaces/mmbt.py and the from_pretrained( ) function in mmf/models/mmbt.py, I have been using that to interface with models that take images as input and I am extending it to vilbert and visualbert. It's quite convenient.

vedanuj commented 3 years ago

cc @brettallenyo Can you please take a look at this?

brettallenyo commented 3 years ago

Hi @hivestrung @junqi-jiang I am going to be uploading some models and configs to our model zoo in the coming days so you will be able to download and use those for the mmf_transformer and visual_bert checkpoints

junqi-jiang commented 3 years ago

@brettallenyo I would really appreciate that! Also, I have written an interface for visual_bert and vilbert to predict on raw images and texts, which has solved this issue another way https://github.com/junqi-jiang/mmf/commit/7f96f5b89ae14442656883ca3922d6a702beb85f.

hackgoofer commented 2 years ago

@brettallenyo, do you mind linking the model zoo checkpoint files here for future reference?