Closed sayakpaul closed 2 years ago
You are on the good track, but there is still one step missing.
After manually downloading the dataset, you need to run tfds once to reformat the data. We provided the script for doing this: big_vision/tools/download_tfds_datasets.py
.
As indicated in the README, to launch data formatting on a TPU machine you could run
gcloud alpha compute tpus tpu-vm ssh $NAME --zone=$ZONE --worker=0 --command "TFDS_DATA_DIR=gs://imagenet-1k/tensorflow_datasetsbash big_vision/run_tpu.sh big_vision.tools.download_tfds_datasets imagenet2012"
Alternatively, you can even do it on your local machine by directly running the util, assuming the local machine has access to the cloud bucket.
Let us know whether it works for you. Leaving the issue open for now.
Thank you! Giving it a try right now.
gcloud alpha compute tpus tpu-vm ssh $NAME --zone=$ZONE --worker=0 --command "TFDS_DATA_DIR=gs://imagenet-1k/tensorflow_datasets bash big_vision/run_tpu.sh big_vision.tools.download_tfds_datasets imagenet2012"
leads to:
Traceback (most recent call last):
File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/spsayakpaul/big_vision/tools/download_tfds_datasets.py", line 43, in <module>
app.run(main)
File "/home/spsayakpaul/bv_venv/lib/python3.8/site-packages/absl/app.py", line 312, in run
_run_main(main, args)
File "/home/spsayakpaul/bv_venv/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main
sys.exit(main(argv))
File "/home/spsayakpaul/big_vision/tools/download_tfds_datasets.py", line 39, in main
tfds.load(name=d, data_dir="~/tensorflow_datasets/", download=True)
File "/home/spsayakpaul/bv_venv/lib/python3.8/site-packages/tensorflow_datasets/core/load.py", line 325, in load
dbuilder.download_and_prepare(**download_and_prepare_kwargs)
File "/home/spsayakpaul/bv_venv/lib/python3.8/site-packages/tensorflow_datasets/core/dataset_builder.py", line 462, in download_and_prepare
self._download_and_prepare(
File "/home/spsayakpaul/bv_venv/lib/python3.8/site-packages/tensorflow_datasets/core/dataset_builder.py", line 1157, in _download_and_prepare
split_generators = self._split_generators( # pylint: disable=unexpected-keyword-arg
File "/home/spsayakpaul/bv_venv/lib/python3.8/site-packages/tensorflow_datasets/image_classification/imagenet.py", line 223, in _split_generators
train_path = os.path.join(dl_manager.manual_dir, 'ILSVRC2012_img_train.tar')
File "/home/spsayakpaul/bv_venv/lib/python3.8/site-packages/tensorflow_datasets/core/utils/py_utils.py", line 152, in __get__
cached = self.fget(obj) # pytype: disable=attribute-error
File "/home/spsayakpaul/bv_venv/lib/python3.8/site-packages/tensorflow_datasets/core/download/download_manager.py", line 649, in manual_dir
raise AssertionError(
AssertionError: Manual directory /home/spsayakpaul/tensorflow_datasets/downloads/manual does not exist or is empty. Create it and download/extract dataset artifacts in there using instructions:
manual_dir should contain two files: ILSVRC2012_img_train.tar and
ILSVRC2012_img_val.tar.
You need to register on http://www.image-net.org/download-images in order
to get the link to download the dataset.
Will running the following help?
import tensorflow_datasets as tfds
data_dir = "gs://imagenet-1k/tensorflow_datasets"
builder = tfds.builder("imagenet2012", data_dir=data_dir)
builder.download_and_prepare()
The error in https://github.com/google-research/big_vision/issues/2#issuecomment-1122519532 is expected I think since
data_dir
is already set here.
yeah, sorry, you likely need to manually override that variable as you suggested.
Let me know if you eventually succeed. In any case, once I have time, I will update the readme with well-tested instructions to get imagenet data to work.
Sure!
I am currently running this:
https://github.com/google-research/big_vision/issues/2#issuecomment-1122533163
Update.
This is the current error (I faced one regarding imagenet2012_real
but was able to quickly resolve it):
11 01:50:04.274455 139742227639360 logging_logger.py:44] Constructing tf.data.Dataset imagenet_v2 for split _EvenSplit(split='test', index=0, count=1, drop_remainder=False), from gs://imagenet-1k/tensorflow_datasets/imagenet_v2/matched-frequency/3.0.0
Traceback (most recent call last):
File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/spsayakpaul/big_vision/train.py", line 372, in <module>
app.run(main)
File "/home/spsayakpaul/bv_venv/lib/python3.8/site-packages/absl/app.py", line 312, in run
_run_main(main, args)
File "/home/spsayakpaul/bv_venv/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main
sys.exit(main(argv))
File "/home/spsayakpaul/big_vision/train.py", line 270, in main
evaluators = eval_common.from_config(
File "/home/spsayakpaul/big_vision/evaluators/common.py", line 37, in from_config
evaluator = module.Evaluator(model, **cfg)
File "/home/spsayakpaul/big_vision/evaluators/classification.py", line 34, in __init__
self.ds, self.steps = input_pipeline.make_for_inference(
File "/home/spsayakpaul/big_vision/input_pipeline.py", line 97, in make_for_inference
data, _ = get_dataset_tfds(dataset=dataset, split=split,
File "/home/spsayakpaul/big_vision/input_pipeline.py", line 53, in get_dataset_tfds
return builder.as_dataset(
File "/home/spsayakpaul/bv_venv/lib/python3.8/site-packages/tensorflow_datasets/core/logging/__init__.py", line 81, in decorator
return function(*args, **kwargs)
File "/home/spsayakpaul/bv_venv/lib/python3.8/site-packages/tensorflow_datasets/core/dataset_builder.py", line 565, in as_dataset
raise AssertionError(
AssertionError: Dataset imagenet_v2: could not find data in gs://imagenet-1k/tensorflow_datasets. Please make sure to call dataset_builder.download_and_prepare(), or pass download=True to tfds.load() before trying to access the tf.data.Dataset object.
Currently doing (after installing tfds-nightly
):
import tensorflow_datasets as tfds
data_dir = "gs://imagenet-1k/tensorflow_datasets"
ds = tfds.load("imagenet_v2", data_dir=data_dir, download=True)
It seems to be taking more than expected but will keep on updating anyway. I am maintaining a log here:
https://gist.github.com/sayakpaul/9544d3ba935805bd47d71fd8596e7bc0 (not yet complete).
Looks like I was able to make things up and running:
--
--
I have also updated the gist I mentioned in https://github.com/google-research/big_vision/issues/2#issuecomment-1123162532.
Keeping it open until the training completes.
Was able to reproduce everything (76.23% on ImageNet-1k validation set) within 90 epochs of pre-training on TPU v3-8 (that took 7 hours 22 mins to complete in total):
The following repository contains everything including the updated instructions, training logs, and the checkpoints:
I have followed the instructions from README. I have set up a TPU v3-8 machine which can be confirmed below:
I have hosted the ImageNet-1k (
imagenet2012
) in a separate bucket and it's structured like the below (following instructions from here):While launching training, I am using the following command:
It results into the following:
Is there anything I'm missing out here?