Closed ambekarsameer96 closed 4 years ago
one option is to use customized tensorflow datasets. the other is to replace tfds with your customized tf.data API, e.g. replace https://github.com/google-research/simclr/blob/01ddaf0bd692ee945dad7ff5fb07b26df1b9edbe/data.py#L133 with something like the following:
dataset = tf.data.Dataset.list_files(pattern)
dataset = dataset.interleave(tf.data.TFRecordDataset, cycle_length=num_readers, block_length=1)
Thanks. Can we run this code on Google colab such that we can make use of TPU on colab?
It should be able to run on colab with some modification, and you're welcomed to try and I will link your colab if you made it work.
There are some colabs examples in https://github.com/google-research/simclr/tree/master/colabs folder for fine-tuning (not pretraining).
Thanks. I will try it for pre-training.
Hi, was abe to run pre-train on colab using TPU for cifar dataset with model_dir and data_dir pointing to Google Cloud Storage Bucket. (output in this link - https://pastebin.com/i1608kRp )
But is there a way to make pretrain work on colab without the usage of GCS Bucket? I am aware that tpu_estimator expects model_dir to be a GCS path and not a local path. Can we make use of any other function as a replacement for tpu_estimator?
One of the ways is to convert tpu_estimator to Keras model and make use of keras.model_fit (using with strategy.scope(): where strategy = tf.distribute.experimental.TPUStrategy(resolver)) but for pertaining the pre-train stage doesn't make use of the usual training process instead makes use of custom training process therefore I am not sure if we can convert tpu_estimator to keras model.
Please let me know if there is a way to do this where we doesn't make use of storage buckets.
i am also not sure if there's another way here. sorry.
Ok. Thank you for your quick response!
Hey @chentingpc, I'm trying to do the same, and you'd mentioned changes in the data.py. But won't we have to make all the changes in run.py first since we're loading tfds there? Specifically, here in run.py (Line 341)
builder = tfds.builder(FLAGS.dataset, data_dir=FLAGS.data_dir)
if you're not using tfds, you can ignore/remove tfds.builder
. Otherwise you can simply change the name of the dataset by selecting it from https://www.tensorflow.org/datasets/catalog/overview
Hi, was abe to run pre-train on colab using TPU for cifar dataset with model_dir and data_dir pointing to Google Cloud Storage Bucket. (output in this link - https://pastebin.com/i1608kRp )
But is there a way to make pretrain work on colab without the usage of GCS Bucket? I am aware that tpu_estimator expects model_dir to be a GCS path and not a local path. Can we make use of any other function as a replacement for tpu_estimator?
One of the ways is to convert tpu_estimator to Keras model and make use of keras.model_fit (using with strategy.scope(): where strategy = tf.distribute.experimental.TPUStrategy(resolver)) but for pertaining the pre-train stage doesn't make use of the usual training process instead makes use of custom training process therefore I am not sure if we can convert tpu_estimator to keras model.
Please let me know if there is a way to do this where we doesn't make use of storage buckets.
can you share your code ? I am not successful to use custom dataset
thanks
hi @chentingpc can you please post the instructions/ guidelines when the code is used on a custom dataset. Any tips for the specific usage of code would also be helpful. Thanks.