facebookresearch / mmf

A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
https://mmf.sh/
Other
5.48k stars 935 forks source link

Do we get a file named "img.tar.gz" in the Hateful Memes Dataset ? #504

Closed AjinkyaP1998 closed 4 years ago

AjinkyaP1998 commented 4 years ago

❓ Questions and Help

The blog at https://www.drivendata.co/blog/hateful-memes-benchmark/ has the following mentioning :

LOADING THE DATA On the data download page, we provide everything you need to get started. Once you've downloaded and extracted the data, in addition to the license.txt and README.md you should see

img.tar.gz is the directory of all the memes we'll be working with for training, validation, and testing. Once extracted, images live in the img directory and have unique identifier ids as filenames, .png train.jsonl is a .jsonl file, which is a list of json records, to be used for training. Each record had key-value pairs for an image id, filename img, extracted text from the image, and of course the image binary label. 0 is non-hateful and 1 is hateful. dev.jsonl provides the same keys, for the validation split. test.jsonl again has the same keys, with the exception of the label key.


The bold letters - img.tar.gz - Do we get this file after the download or after the extraction. I haven't got this file in the downloaded data. The files available in the download data are : test.jsonl, train.jsonl, dev.jsonl, README.md, License, img(folder)

In what format do I need to use the data during the model training using MMF?

apsdehal commented 4 years ago

DrivenData blog is independent from MMF. If you are planning to use MMF for hateful memes, please follow instructions at https://mmf.sh/docs/challenges/hateful_memes_challenge. Let us know if you have any issues with that.