facebookresearch / mmf

A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
https://mmf.sh/
Other
5.48k stars 934 forks source link

Colab zip extraction for Hateful Memes fails #886

Closed jdsteen627 closed 1 year ago

jdsteen627 commented 3 years ago

🐛 Bug

The hateful memes notebook example fails due to issues with extracting the zip file.

Command

To Reproduce

Steps to reproduce the behavior:

  1. Open the example (https://colab.research.google.com/github/facebookresearch/mmf/blob/notebooks/notebooks/mmf_hm_example.ipynb)
  2. Run the cells up to number 5
  3. Running cell 5 as is leads to a checksum error
  4. Adding --bypass_checksum=1 avoids checksum issue, but leads to "AssertionError: dev.jsonl doesn't exist in /root/.cache/torch/mmf/data/datasets/hateful_memes/defaults/images"
  5. (some) later cells will work, but any requiring the extracted data fail with "AssertionError: Hateful Memes Dataset doesn't do automatic downloads; please follow instructions at https://fb.me/hm_prerequisites"

failed_extraction failed_training

apsdehal commented 3 years ago

We are looking into this. Thanks for pointing out the issue.

apsdehal commented 3 years ago

Hi, Can you please install latest version of MMF using

pip uninstall -y mmf
pip install mmf@https://github.com/facebookresearch/mmf/tarball/master

That should fix this issue.

davkis123 commented 3 years ago

Still getting AssertionError: train.jsonl doesn't exist in /root/.cache/torch/mmf/data/datasets/hateful_memes/defaults/images #887

Noman-Tanveer commented 3 years ago

I'm also facing the same issue. I have a fresh install from the directory.

woquinocoin commented 3 years ago

I'm also facing the same issue. I have a fresh install from the directory.

I found that it is because the mmf_convert method added /data to the path while unzipping. Hence, the structure of the zip file should look like hateful_memes/data/img or ateful_memes/data/test.jsonl Here is their source code about the path:

for file in files_needed: exists = exists or PathManager.exists(os.path.join(folder, "data", file))

lmwijesundara commented 3 years ago

I'm having the same issue. @woquinocoin is this worked?

lmwijesundara commented 3 years ago

I rearranged and compressed as mentioned, but didn't work for me

Noman-Tanveer commented 3 years ago

For me, it was extracted in the "hateful_memes" folder instead of the "data" directory pointed to in the code. I changed the exists = exists or PathManager.exists(os.path.join(folder, "data", file)) to exists = exists or PathManager.exists(os.path.join(folder, "hateful_memes", file)) and it worked.

mingshanhee commented 3 years ago

Thanks @Noman-Tanveer! Your solution worked for me. They probably changed the directory name containing the data without making modifications to the cli methods.

woquinocoin commented 3 years ago

I rearranged and compressed as mentioned, but didn't work for me

try to use this dataset I re-ziped: https://drive.google.com/file/d/1qkp3G7Ua_ePz1tBKI6tF2tjCqf6_O2d-/view?usp=sharing

AnjumJ123 commented 2 years ago

Still getting AssertionError: train.jsonl doesn't exist in /root/.cache/torch/mmf/data/datasets/hateful_memes/defaults/images #887

I am getting the same issue. Even though train.jsonl is in the path /root/.cache/torch/mmf/data/datasets/hateful_memes/defaults/images, the system is failing the assertion. Any help or suggestion to solve the issue will be appreciated.

AnjumJ123 commented 2 years ago

Can anybody please help me with the fix for the issue. Please refer the link for more details: https://github.com/rizavelioglu/hateful_memes-hate_detectron/issues/3#issue-1209723871

Hao-YunDeng commented 2 years ago

I'm having the same issue. Waiting for solutions

karembadawy commented 2 years ago

@AnjumJ123 @Hao-YunDeng Try to unzip the file then compress it again after renaming the folder from hateful_memes to data this works for me.

DanielLin97 commented 1 year ago

I rearranged and compressed as mentioned, but didn't work for me

try to use this dataset I re-ziped: https://drive.google.com/file/d/1qkp3G7Ua_ePz1tBKI6tF2tjCqf6_O2d-/view?usp=sharing

sorry the link does not work now. Could you please share me with the dataset? I need the images in the dataset. I promise to only use it for research purposes and will not distribute it to any third parties.

Eryk1705 commented 1 year ago

1308

himi11 commented 11 months ago

Still getting AssertionError: train.jsonl doesn't exist in /root/.cache/torch/mmf/data/datasets/hateful_memes/defaults/images #887

I am getting the same issue. Even though train.jsonl is in the path /root/.cache/torch/mmf/data/datasets/hateful_memes/defaults/images, the system is failing the assertion. Any help or suggestion to solve the issue will be appreciated.

@AnjumJ123 were you able to fix it? I'm facing the same issue.

mr-rhombus commented 11 months ago

Still getting AssertionError: train.jsonl doesn't exist in /root/.cache/torch/mmf/data/datasets/hateful_memes/defaults/images #887

I am getting the same issue. Even though train.jsonl is in the path /root/.cache/torch/mmf/data/datasets/hateful_memes/defaults/images, the system is failing the assertion. Any help or suggestion to solve the issue will be appreciated.

@AnjumJ123 were you able to fix it? I'm facing the same issue.

I was battling through these same issues last night. This is what worked for me.

  1. Download hateful memes data from here
  2. Unzip "hateful_memes.zip"
  3. Rename "hateful_memes" dir to "data"
  4. Re-zip "data" dir to "data.zip"
  5. Run mmf_convert_hm --zip_file=data.zip --password=password --bypass_checksum=1 to "mmf-ify" the data (per the "Prerequisites" section here)

FWIW, I'm on a PC and using mmf=1.0.0rc12. I installed mmf according to the mmf docs, specifically with the pip install --editable . option (which came with its own set of challenges)