google-research / meta-dataset

A dataset of datasets for learning to learn from few examples
Apache License 2.0
761 stars 139 forks source link

tfds doesn't work to get meta-dataset data #111

Closed brando90 closed 1 year ago

brando90 commented 1 year ago

I tried running tfds instructions, error:

      volume={350},
      number={6266},
      pages={1332--1338},
      year={2015},
    }
    @misc{jongejan2016quick,
      title={The {Quick}, {Draw}! -- {A.I.} experiment},
      author={Jongejan, Jonas and Rowley, Henry and Kawashima, Takashi and Kim,
              Jongmin and Fox-Gieg, Nick},
      howpublished={\url{quickdraw.withgoogle.com}},
      year={2016}
    }
    @inproceedings{stallkamp2011german,
      author={Johannes Stallkamp and Marc Schlipsing and Jan Salmen and Christian Igel},
      booktitle={IEEE International Joint Conference on Neural Networks},
      title={The {G}erman {T}raffic {S}ign {R}ecognition {B}enchmark: A multi-class
             classification competition},
      year={2011},
      pages={1453--1460}
    }
    @inproceedings{nilsback2008automated,
      title={Automated flower classification over a large number of classes},
      author={Nilsback, Maria-Elena and Zisserman, Andrew},
      booktitle={2008 Sixth Indian Conference on Computer Vision, Graphics \& Image
                 Processing},
      pages={722--729},
      year={2008},
      organization={IEEE}
    }""",
)

WARNING[file_utils.py]: `tfds.core.as_path` is deprecated. Pathlib API has been moved to a
separate module. To migrate, use:

from etils import epath
path = epath.Path('gs://path/to/f.txt')

Alternatively `tfds.core.Path` is an alias of `epath.Path`.

Installation: `pip install etils[epath]`
WARNING[file_utils.py]: `tfds.core.as_path` is deprecated. Pathlib API has been moved to a
separate module. To migrate, use:

from etils import epath
path = epath.Path('gs://path/to/f.txt')

Alternatively `tfds.core.Path` is an alias of `epath.Path`.

Installation: `pip install etils[epath]`
INFO[build.py]: download_and_prepare for dataset meta_dataset/mscoco/1.0.0...
INFO[dataset_builder.py]: Generating dataset meta_dataset (/lfs/ampere4/0/brando9/tensorflow_datasets/meta_dataset/mscoco/1.0.0)
Downloading and preparing dataset Unknown size (download: Unknown size, generated: Unknown size, total: Unknown size) to /lfs/ampere4/0/brando9/tensorflow_datasets/meta_dataset/mscoco/1.0.0...
INFO[download_manager.py]: Downloading http://images.cocodataset.org/annotations/annotations_trainval2017.zip into /lfs/ampere4/0/brando9/tensorflow_datasets/downloads/images.cocodat.org_annotat_annotat_trainvalb4uysGZomReJqvlUvYiqgwjQhBAC15ZyQJ0pCo-8_c.zip.tmp.cc975170950e49d7a2a95c8b807bfba7...
INFO[download_manager.py]: Downloading http://images.cocodataset.org/zips/train2017.zip into /lfs/ampere4/0/brando9/tensorflow_datasets/downloads/images.cocodataset.org_zips_train2017kf8fi7FBG8CRFD5M_1iGISWHMGlxy31RoQYQAApVnVY.zip.tmp.29bbec73ca1e46ecb7ea604681a3ad96...
Extraction completed...: 100%|███████████████████████████████████████████████████████████████████████████████| 118293/118293 [28:18<00:00, 69.63 file/s]
Dl Size...: 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 18682/18682 [28:18<00:00, 11.00 MiB/s]
Dl Completed...: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [28:18<00:00, 849.47s/ url]
Traceback (most recent call last):
  File "/lfs/ampere4/0/brando9/miniconda/envs/mds_env_gpu/bin/tfds", line 8, in <module>
    sys.exit(launch_cli())
  File "/lfs/ampere4/0/brando9/miniconda/envs/mds_env_gpu/lib/python3.9/site-packages/tensorflow_datasets/scripts/cli/main.py", line 104, in launch_cli
    app.run(main, flags_parser=_parse_flags)
  File "/lfs/ampere4/0/brando9/miniconda/envs/mds_env_gpu/lib/python3.9/site-packages/absl/app.py", line 303, in run
    _run_main(main, args)
  File "/lfs/ampere4/0/brando9/miniconda/envs/mds_env_gpu/lib/python3.9/site-packages/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "/lfs/ampere4/0/brando9/miniconda/envs/mds_env_gpu/lib/python3.9/site-packages/tensorflow_datasets/scripts/cli/main.py", line 99, in main
    args.subparser_fn(args)
  File "/lfs/ampere4/0/brando9/miniconda/envs/mds_env_gpu/lib/python3.9/site-packages/tensorflow_datasets/scripts/cli/build.py", line 233, in _build_datasets
    _download_and_prepare(args, builder)
  File "/lfs/ampere4/0/brando9/miniconda/envs/mds_env_gpu/lib/python3.9/site-packages/tensorflow_datasets/scripts/cli/build.py", line 435, in _download_and_prepare
    builder.download_and_prepare(
  File "/lfs/ampere4/0/brando9/miniconda/envs/mds_env_gpu/lib/python3.9/site-packages/tensorflow_datasets/core/dataset_builder.py", line 600, in download_and_prepare
    self._download_and_prepare(
  File "/lfs/ampere4/0/brando9/miniconda/envs/mds_env_gpu/lib/python3.9/site-packages/tensorflow_datasets/core/dataset_builder.py", line 1399, in _download_and_prepare
    future = split_builder.submit_split_generation(
  File "/lfs/ampere4/0/brando9/miniconda/envs/mds_env_gpu/lib/python3.9/site-packages/tensorflow_datasets/core/split_builder.py", line 326, in submit_split_generation
    return self._build_from_generator(**build_kwargs)
  File "/lfs/ampere4/0/brando9/miniconda/envs/mds_env_gpu/lib/python3.9/site-packages/tensorflow_datasets/core/split_builder.py", line 389, in _build_from_generator
    for key, example in utils.tqdm(
  File "/lfs/ampere4/0/brando9/miniconda/envs/mds_env_gpu/lib/python3.9/site-packages/tqdm/std.py", line 1185, in __iter__
    for obj in iterable:
  File "/afs/cs.stanford.edu/u/brando9/diversity-for-predictive-success-of-meta-learning/meta-dataset/meta_dataset/data/tfds/example_generators.py", line 332, in generate_mscoco_examples
    image = image.convert(mode='RGB')
  File "/lfs/ampere4/0/brando9/miniconda/envs/mds_env_gpu/lib/python3.9/site-packages/PIL/Image.py", line 889, in convert
    self.load()
  File "/lfs/ampere4/0/brando9/miniconda/envs/mds_env_gpu/lib/python3.9/site-packages/PIL/ImageFile.py", line 226, in load
    seek(offset)
ValueError: seek of closed file

what I ran:

# - set up the file descriptor limit or the remaining tfds cmds won't work
# check soft limit, output should be 1024
ulimit -Sn
# check hard limit, output should be 1024
ulimit -Hn
# increase the limit
ulimit -n 120000
# check soft limit, output should be 120000
ulimit -Sn
# check hard limit, output should be 120000
ulimit -Hn

# - The only manual intervention required is to download the ILSVRC 2012 training data (ILSVRC2012_img_train.tar) into TFDS's manual download directory (e.g. ~/tensorflow_datasets/downloads/manual/).
# (ILSVRC2012_img_train.tar) into TFDS's manual download directory (e.g. ~/tensorflow_datasets/downloads/manual/).
mkdir -p $HOME/tensorflow_datasets/downloads/manual/
wget https://.../ILSVRC2012_img_train.tar -O  $HOME/tensorflow_datasets/downloads/manual/ILSVRC2012_img_train.tar
# check size of .tar file, should 138G
ls -lh $HOME/tensorflow_datasets/downloads/manual/ILSVRC2012_img_train.tar

# - First, make sure that meta_dataset and its dependencies are installed. This can be done with ... one of the approaches at the top of this file. Not copy pasting to avoid maintaining two different set of codes
# pip install & reqs.txt...

# - Generating the tfrecord files associated with all data sources and storing them in ~/tensorflow_datasets/meta_dataset is done
# check size of .tar file, should 138G
ls -lh $HOME/tensorflow_datasets/downloads/manual/ILSVRC2012_img_train.tar
# check that the imganet tar file is where it should be i.e. the <MANUAL_DIR> is the directory where the ILSVRC2012_img_train.tar file was downloaded. Check that it's there and it's 13GB (according to original instructions)
ls -lh $HOME/tensorflow_datasets/downloads/manual
# cd into the tfds dir & run the required command
cd $HOME/diversity-for-predictive-success-of-meta-learning/meta-dataset/meta_dataset/data/tfds
tfds build md_tfds --manual_dir=$HOME/tensorflow_datasets/downloads/manual
# todo: this step takes... hours (started 8pm jan 10th...ended...)
brando90 commented 1 year ago

https://github.com/google-research/meta-dataset/blob/main/meta_dataset/data/tfds/README.md

lamblin commented 1 year ago

It looks like the MSCOCO data source is no longer working, as you reported in #108.

I'll close this one for now, we can reopen it if there are other TFDS issues remaining.