facebookresearch / mmf

A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
https://mmf.sh/
Other
5.48k stars 935 forks source link

For inference: How do I load images? #840

Closed shivgodhia closed 3 years ago

shivgodhia commented 3 years ago

I want to use the models ConcatBert, Late Fusion, Text Bert, Image Grid and Visualbert COCO for inference of Hateful Memes, by building a website around those models which can take a jpeg/png and the caption and spit out a hatefulness score.

I read this https://github.com/facebookresearch/mmf/issues/364

I still don't really understand how to do this. It should be simple but it's really difficult.

Here is my code as it is now


# GPU control
GPU_MODE = False

# Specify a path
PATH = "concat_bert_final.pth"
CONFIG_PATH = "../../mmf/projects/hateful_memes/configs/concat_bert/defaults.yaml"

optsList = ["checkpoint.resume_file=concat_bert_final.pth",
            "config=mmf/projects/hateful_memes/configs/concat_bert/defaults.yaml",
            "model=concat_bert"]

# build model
args = argparse.Namespace(config_override=None)
args.opts = optsList
configuration = Configuration(args, load_dataset=False)
configuration.args = args
config = configuration.get_config()
config.start_rank = 0
config.device_id = 0

setup_imports()
configuration.import_user_dir()
config = configuration.get_config()

config = build_config(configuration)

model_name = config.model
model_class = registry.get_model_class(model_name)
if model_class is None:
    raise RuntimeError(f"No model registered for name: {model_name}")
model = model_class(config.model_config[model_name])

if torch.cuda.is_available():
    torch.cuda.set_device(config.device_id)
    torch.cuda.init()

model.load_requirements()
model.build()

state_dict = torch.load(PATH) if GPU_MODE else torch.load(
    PATH, map_location=torch.device('cpu'))
new_state_dict = OrderedDict()
for k, v in state_dict.items():
    name = k[7:]  # remove 'module.' of dataparallel
    new_state_dict[name] = v
model.load_state_dict(new_state_dict)
model.eval()

# build processor
dataset_name = list(config.dataset_config.keys())[0]
processor = build_processors(config.dataset_config[dataset_name].processors)

# inputs for the model
# TODO: generalise
text = "blah blah blah"
text = {"text": text}
image_path = "./59420.png"
text_output = processor["text_processor"](text)
# print(text_output )
img = np.array(Image.open(image_path))
img = torch.as_tensor(img)
shivgodhia commented 3 years ago

I also don't really understand what transforms are being applied for each image that is being loaded. Like, what happens to each .png image?

@apsdehal Sorry to bother you, but would you be able to give me a quick rundown of the preprocessing done on the image file (and text, but I think I understand the text processing) for Hateful Memes dataset specifically? Thank you in advance, I really appreciate it.

vedanuj commented 3 years ago

Hi @hivestrung .. You can check how the inference API has been implemented. Should be easy to extend this to your models. You will need to build the processors and pass your image and text wrapped in a SampleList to these processors and the output of the processor to the model. Check this out :

https://github.com/facebookresearch/mmf/blob/master/mmf/utils/inference.py

shivgodhia commented 3 years ago

@vedanuj Thanks for your reply. I did check it out, and I do understand the thing about SampleLists, but it's a little trickier than that because how to process the image (from taking in the path to the png to transforms to turning it into a tensor to putting it into a SampleList) seems to vary for each model.

Also, the Inference API assumes images have a processor. This is not true for ConcatBert, at least. There is only a text processor and what Brett has done works for the text processing, but images are different.

Currently I am working on extending it to ConcatBert and this is what has worked (for the image processing). By "worked" I mean that it actually compiles and gives me a prediction, it remains to be seen if I have processed images properly such that the predictions are in line with what you'd expect from ConcatBert.

It's not actually very clear what are the image transforms for the Hateful Memes dataset?

from PIL import Image
# and other imports as needed
img = Image.open(image_path)
img = img.convert("RGB")
# Question: Are there other transforms to be applied?
transform_list = [torchvision.transforms.ToTensor()]
transform = torchvision.transforms.Compose(transform_list)
img  = transform(img)

sample = Sample(text_output)
sample.text = text_output
sample.image = img
sample_list = SampleList([sample])
predictions = model(sample_list)
vedanuj commented 3 years ago

It's not actually very clear what are the image transforms for the Hateful Memes dataset?

Here are the image transforms we use for hateful memes models.

https://github.com/facebookresearch/mmf/blob/master/mmf/configs/datasets/hateful_memes/defaults.yaml#L46

You can build this image processor and pass your images to that.

shivgodhia commented 3 years ago

Wow this is perfect, thank you so much!

Best regards, Shiv Godhia


From: Vedanuj Goswami @.> Sent: Wednesday, March 31, 2021 7:15:23 PM To: facebookresearch/mmf @.> Cc: Shiv Godhia @.>; Mention @.> Subject: Re: [facebookresearch/mmf] For inference: How do I load images? (#840)

It's not actually very clear what are the image transforms for the Hateful Memes dataset?

Here are the image transforms we use for hateful memes models.

https://github.com/facebookresearch/mmf/blob/master/mmf/configs/datasets/hateful_memes/defaults.yaml#L46

You can build this image processor and pass your images to that.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/facebookresearch/mmf/issues/840#issuecomment-811302836, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AGUNFOJYEOQXPGOU7S5S52TTGNRDXANCNFSM42BWJUHQ.

shivgodhia commented 3 years ago

@vedanuj One question, I notice that there is a processor for text too. This is different from what I see in the configs when I try to build the model.

For example for ConcatBERT I see this:


{'hateful_memes': {'processors': {'text_processor': {'type': 'bert_tokenizer',
    'params': {'tokenizer_config': {'type': 'bert-base-uncased',
      'params': {'do_lower_case': True}},
     'mask_probability': 0,
     'max_seq_length': 128}}}}}

Which text processor should I use?

vedanuj commented 3 years ago

Yes for concat bert you should use this processor. Is that causing any issues?

shivgodhia commented 3 years ago

Nope, no issues - just trying to make sure I'm using the right processors for everything. Thanks for the help!