By the process 'extraction' does it just mean to extract files from the zipped format?

facebookresearch / mmf

A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)

https://mmf.sh/

Other

5.5k stars 939 forks source link

By the process 'extraction' does it just mean to extract files from the zipped format? #505

Closed AjinkyaP1998 closed 1 year ago

AjinkyaP1998 commented 4 years ago

❓ Questions and Help

In the prerequisites of Facebook Hateful Memes Challenge, the command mmf_convert_hm --zip_file=x --password=y is mentioned. If this is just a simple extraction of zipped files why do we need to execute using this command? Can we just unzip it using certain unzipping software?

Secondly, after executing the command, the zipped file at the location x goes missing and is extracted somewhere? What is that location?

apsdehal commented 4 years ago

Hi again! Please use a single issue to continue discussion on the same topic. You can extract, but you will then need to bring it MMF format which is what utility takes care of. You can read the code of mmf_convert_hm at https://github.com/facebookresearch/mmf/blob/master/mmf_cli/hm_convert.py to replicate what it does.

In simple words:

Extracts out the zip into its contents
Matches the shasum to make sure file is correct.
Moves annotation jsonl files to $USER/.cache/torch/mmf/data/datasets/hateful_memes/defaults/annotations/
Moves img folder to $USER/.cache/torch/mmf/data/datasets/hateful_memes/defaults/images.

This will make sure the existing configs in MMF works as it is. If you can replicate this locally it should work.

AjinkyaP1998 commented 4 years ago

Can we use MMF library in Jupyter Notebook? If yes, how are we supposed to execute this code given: mmf_run config=projects/hateful_memes/configs/mmbt/defaults.yaml model=mmbt dataset=hateful_memes run_type=train_val

apsdehal commented 4 years ago

In a separate code shell run:

!mmf_run config=projects/hateful_memes/configs/mmbt/defaults.yaml model=mmbt dataset=hateful_memes run_type=train_val

Note ! in the start.

AjinkyaP1998 commented 4 years ago

After executing the code: !mmf_run config=projects/hateful_memes/configs/mmbt/defaults.yaml model=mmbt dataset=hateful_memes run_type=train_val I get the following error: ModuleNotFoundError: No module named 'mmf.trainers.callbacks'

Is it something to do with my MMF Lib installation?

apsdehal commented 4 years ago

Can you restart your runtime and try again?

AjinkyaP1998 commented 4 years ago

Yes, I did. Still getting the same error.

apsdehal commented 4 years ago

It is hard to say what exactly might be causing issue. We have a tutorial on using HM in colab already at: https://colab.research.google.com/github/facebookresearch/mmf/blob/notebooks/notebooks/mmf_hm_example.ipynb. Please follow this one.

AjinkyaP1998 commented 4 years ago

Where is the unzipped data stored on Google Colabs after executing the following code: !mmf_convert_hm --zip_file /content/hm.zip --password *****************

apsdehal commented 4 years ago

Please see the summary above in the recent comment which describes how zip is extracted to understand where the files go.

AjinkyaP1998 commented 4 years ago

I have followed the procedure (https://colab.research.google.com/github/facebookresearch/mmf/blob/notebooks/notebooks/mmf_hm_example.ipynb) and downloaded the dataset. I've even unzipped it using the following command executing it in Google colab. !mmf_convert_hm --zip_file /content/hm.zip --password $password

I have the files at the location :

$USER/.cache/torch/mmf/data/datasets/hateful_memes/defaults/annotations/
img folder in $USER/.cache/torch/mmf/data/datasets/hateful_memes/defaults/images

But Google colab asks me download the data again, why is it not able to use the dataset which is already present at above-mentioned locations: I get an error saying : AssertionError: Hateful Memes Dataset doesn't do automatic downloads; please follow instructions at https://fb.me/hm_prerequisites

AjinkyaP1998 commented 4 years ago

And now the exact opposite case is happening: I deleted the .cache folder just to check whether it gets created after I run the unzipping command in Google colab. And now I don't find any folder created over there and the code runs well in Google colab and even generates results for the Hateful Memes challenge.

Google colab shows some path during unzipping as /root/.cache/torch/..........

Can you please explain whether Google colab unzips this dataset on the local machine or somewhere on the cloud? Why the .cache folder gets created only when I run the unzipping command on cmd and not when I run it on Google colab? If I wish to get that folder again, do I need to again run the unzipping command on my cmd?