Closed shivgodhia closed 3 years ago
I also don't really understand what transforms are being applied for each image that is being loaded. Like, what happens to each .png image?
@apsdehal Sorry to bother you, but would you be able to give me a quick rundown of the preprocessing done on the image file (and text, but I think I understand the text processing) for Hateful Memes dataset specifically? Thank you in advance, I really appreciate it.
Hi @hivestrung .. You can check how the inference API has been implemented. Should be easy to extend this to your models. You will need to build the processors and pass your image and text wrapped in a SampleList to these processors and the output of the processor to the model. Check this out :
https://github.com/facebookresearch/mmf/blob/master/mmf/utils/inference.py
@vedanuj Thanks for your reply. I did check it out, and I do understand the thing about SampleLists, but it's a little trickier than that because how to process the image (from taking in the path to the png to transforms to turning it into a tensor to putting it into a SampleList) seems to vary for each model.
Also, the Inference API assumes images have a processor. This is not true for ConcatBert, at least. There is only a text processor and what Brett has done works for the text processing, but images are different.
Currently I am working on extending it to ConcatBert and this is what has worked (for the image processing). By "worked" I mean that it actually compiles and gives me a prediction, it remains to be seen if I have processed images properly such that the predictions are in line with what you'd expect from ConcatBert.
It's not actually very clear what are the image transforms for the Hateful Memes dataset?
from PIL import Image
# and other imports as needed
img = Image.open(image_path)
img = img.convert("RGB")
# Question: Are there other transforms to be applied?
transform_list = [torchvision.transforms.ToTensor()]
transform = torchvision.transforms.Compose(transform_list)
img = transform(img)
sample = Sample(text_output)
sample.text = text_output
sample.image = img
sample_list = SampleList([sample])
predictions = model(sample_list)
It's not actually very clear what are the image transforms for the Hateful Memes dataset?
Here are the image transforms we use for hateful memes models.
You can build this image processor and pass your images to that.
Wow this is perfect, thank you so much!
Best regards, Shiv Godhia
From: Vedanuj Goswami @.> Sent: Wednesday, March 31, 2021 7:15:23 PM To: facebookresearch/mmf @.> Cc: Shiv Godhia @.>; Mention @.> Subject: Re: [facebookresearch/mmf] For inference: How do I load images? (#840)
It's not actually very clear what are the image transforms for the Hateful Memes dataset?
Here are the image transforms we use for hateful memes models.
You can build this image processor and pass your images to that.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/facebookresearch/mmf/issues/840#issuecomment-811302836, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AGUNFOJYEOQXPGOU7S5S52TTGNRDXANCNFSM42BWJUHQ.
@vedanuj One question, I notice that there is a processor for text too. This is different from what I see in the configs when I try to build the model.
For example for ConcatBERT I see this:
{'hateful_memes': {'processors': {'text_processor': {'type': 'bert_tokenizer',
'params': {'tokenizer_config': {'type': 'bert-base-uncased',
'params': {'do_lower_case': True}},
'mask_probability': 0,
'max_seq_length': 128}}}}}
Which text processor should I use?
Yes for concat bert you should use this processor. Is that causing any issues?
Nope, no issues - just trying to make sure I'm using the right processors for everything. Thanks for the help!
I want to use the models ConcatBert, Late Fusion, Text Bert, Image Grid and Visualbert COCO for inference of Hateful Memes, by building a website around those models which can take a jpeg/png and the caption and spit out a hatefulness score.
I read this https://github.com/facebookresearch/mmf/issues/364
I still don't really understand how to do this. It should be simple but it's really difficult.
Here is my code as it is now