facebookresearch / mmf

A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
https://mmf.sh/
Other
5.5k stars 939 forks source link

Missing annotations file when running VISUALBert on COCO #421

Closed sTranaeus closed 4 years ago

sTranaeus commented 4 years ago

❓ Questions and Help

I want to pretrain VISUALBert on COCO again, and tried running !mmf_run config=configs/datasets/coco/defaults.yaml model=visual_bert dataset=coco but got a FileNotFoundError(shown below). I thought it would be a mistake on my side with how I set things up, but isn't mmf_run meant to automatically handle setting up the data locally if it isn't there?


Overriding option config to configs/datasets/coco/defaults.yaml
Overriding option model to visual_bert
Overriding option datasets to coco
Using seed 55891710
Logging to: ./save/logs/train_2020_07_20T09_53_55.log
Traceback (most recent call last):
  File "/home/k1762177/hateful_memes/bin/mmf_run", line 33, in <module>
    sys.exit(load_entry_point('mmf', 'console_scripts', 'mmf_run')())
  File "/home/k1762177/mmf/mmf_cli/run.py", line 112, in run
    main(configuration, predict=predict)
  File "/home/k1762177/mmf/mmf_cli/run.py", line 41, in main
    trainer.load()
  File "/home/k1762177/mmf/mmf/trainers/mmf_trainer.py", line 38, in load
    super().load()
  File "/home/k1762177/mmf/mmf/trainers/base_trainer.py", line 38, in load
    self.load_datasets()
  File "/home/k1762177/mmf/mmf/trainers/mmf_trainer.py", line 62, in load_datasets
    self.dataset_loader.load_datasets()
  File "/home/k1762177/mmf/mmf/common/dataset_loader.py", line 17, in load_datasets
    self.train_dataset.load(self.config)
  File "/home/k1762177/mmf/mmf/datasets/multi_dataset_loader.py", line 118, in load
    self.build_datasets(config)
  File "/home/k1762177/mmf/mmf/datasets/multi_dataset_loader.py", line 135, in build_datasets
    dataset_instance = build_dataset(dataset, dataset_config, self.dataset_type)
  File "/home/k1762177/mmf/mmf/utils/build.py", line 107, in build_dataset
    dataset = builder_instance.load_dataset(config, dataset_type)
  File "/home/k1762177/mmf/mmf/datasets/base_dataset_builder.py", line 96, in load_dataset
    dataset = self.load(config, dataset_type, *args, **kwargs)
  File "/home/k1762177/mmf/mmf/datasets/builders/coco/builder.py", line 47, in load
    dataset = super().load(config, *args, **kwargs)
  File "/home/k1762177/mmf/mmf/datasets/builders/vqa2/builder.py", line 23, in load
    dataset = super().load(*args, **kwargs)
  File "/home/k1762177/mmf/mmf/datasets/mmf_dataset_builder.py", line 141, in load
    dataset = dataset_class(config, dataset_type, imdb_idx)
  File "/home/k1762177/mmf/mmf/datasets/builders/coco/dataset.py", line 12, in __init__
    config, dataset_type, imdb_file_index, dataset_name="coco", *args, **kwargs
  File "/home/k1762177/mmf/mmf/datasets/builders/vqa2/dataset.py", line 20, in __init__
    super().__init__(name, config, dataset_type, index=imdb_file_index)
  File "/home/k1762177/mmf/mmf/datasets/mmf_dataset.py", line 25, in __init__
    self.annotation_db = self.build_annotation_db()
  File "/home/k1762177/mmf/mmf/datasets/mmf_dataset.py", line 39, in build_annotation_db
    return AnnotationDatabase(self.config, annotation_path)
  File "/home/k1762177/mmf/mmf/datasets/databases/annotation_database.py", line 24, in __init__
    self.load_annotation_db(path)
  File "/home/k1762177/mmf/mmf/datasets/databases/annotation_database.py", line 30, in load_annotation_db
    self._load_npy(path)
  File "/home/k1762177/mmf/mmf/datasets/databases/annotation_database.py", line 47, in _load_npy
    self.db = np.load(path, allow_pickle=True)
  File "/home/k1762177/hateful_memes/lib/python3.7/site-packages/numpy/lib/npyio.py", line 416, in load
    fid = stack.enter_context(open(os_fspath(file), "rb"))
FileNotFoundError: [Errno 2] No such file or directory: '/home/k1762177/.cache/torch/mmf/data/datasets/coco/defaults/annotations/imdb_karpathy_train.npy'
vedanuj commented 4 years ago

Automated download is not enabled for coco datasets. We will add the support for it.

sTranaeus commented 4 years ago

Thank you for the response! It's not clear to me if a change has been pushed or not. When rerunning the same command, on the latest version of mmf from source, I have the same issue. Should I be expecting a different behaviour?

sTranaeus commented 4 years ago

For what it's worth, there is some automatic downloading already happening, and I can see a lot of COCO data present:

$ ls .cache/torch/mmf/data/datasets/coco/defaults/features/ -lh
total 161G
drwxrwxr-x 2 k1762177 k1762177   4 May 18 23:24 test2015.lmdb
-rw-rw-r-- 1 k1762177 k1762177 64G Jul 14 18:19 test2015.tar.gz
drwxr-xr-x 2 k1762177 k1762177   4 May 18 07:39 trainval2014.lmdb
-rw-rw-r-- 1 k1762177 k1762177 97G Jul 14 21:19 trainval2014.tar.gz

The specific file that is missing according to MMF is(see rest of error output in issue's first comment): FileNotFoundError: [Errno 2] No such file or directory: '/home/k1762177/.cache/torch/mmf/data/datasets/coco/defaults/annotations/imdb_karpathy_train.npy'

Is this file also enabled for automatic download?

apsdehal commented 4 years ago

@sTranaeus Yes, it should be there, here is the direct link to download: https://dl.fbaipublicfiles.com/mmf/data/datasets/coco/defaults/annotations/annotations.tar.gz in case you still haven't found it.

rm -rf /home/k1762177/.cache/torch/mmf/data/datasets/coco/defaults/annotations/
cd /home/k1762177/.cache/torch/mmf/data/datasets/coco/defaults/
mkdir annotations
cd annotations
wget https://dl.fbaipublicfiles.com/mmf/data/datasets/coco/defaults/annotations/annotations.tar.gz

Then run your normal mmf command (don't extract the annotations manually).

sTranaeus commented 4 years ago

Thank you. I've gone through those commands, and am getting a checksum error now: AssertionError: [ Checksum for annotations.tar.gz from https://dl.fbaipublicfiles.com/mmf/data/datasets/coco/defaults/annotations/annotations.tar.gz does not match the expected checksum. Please try again. ]

Should I just keep trying again? Or is there an issue with the annotations file stored at that link?

apsdehal commented 4 years ago

Let me try a fresh install and get back to you.

sTranaeus commented 4 years ago

@apsdehal still getting the error - any luck here?

apsdehal commented 4 years ago

@sTranaeus I did a fresh clone and ran MMF in isolation and confirmed that this works and throws an error at the stage of loading an optimizer as the one is not defined.

conda create -n mmf_test python=3.7
conda activate mmf_test
cd ~
mkdir -p test
cd test
git clone https://github.com/facebookresearch/mmf.git 
cd mmf
python setup.py develop
mmf_run config=configs/datasets/coco/defaults.yaml model=visual_bert dataset=coco

This downloaded all of the features, annotations, extracted them and everything worked fine.

Also, if you are actually running VisualBERT for pretraining on COCO, you want the dataset=masked_coco and the proper project configuration in projects/visual_bert/configs/.

apsdehal commented 4 years ago

Closing, since we haven't heard back and we tested that it is working as expected. Please open up a new issue if problem persists.