Closed AjinkyaP1998 closed 4 years ago
DrivenData blog is independent from MMF. If you are planning to use MMF for hateful memes, please follow instructions at https://mmf.sh/docs/challenges/hateful_memes_challenge. Let us know if you have any issues with that.
❓ Questions and Help
The blog at https://www.drivendata.co/blog/hateful-memes-benchmark/ has the following mentioning :
LOADING THE DATA On the data download page, we provide everything you need to get started. Once you've downloaded and extracted the data, in addition to the license.txt and README.md you should see
img.tar.gz is the directory of all the memes we'll be working with for training, validation, and testing. Once extracted, images live in the img directory and have unique identifier ids as filenames,.png
train.jsonl is a .jsonl file, which is a list of json records, to be used for training. Each record had key-value pairs for an image id, filename img, extracted text from the image, and of course the image binary label. 0 is non-hateful and 1 is hateful.
dev.jsonl provides the same keys, for the validation split.
test.jsonl again has the same keys, with the exception of the label key.
The bold letters - img.tar.gz - Do we get this file after the download or after the extraction. I haven't got this file in the downloaded data. The files available in the download data are : test.jsonl, train.jsonl, dev.jsonl, README.md, License, img(folder)
In what format do I need to use the data during the model training using MMF?