Closed orbennatan closed 5 years ago
@orbennatan You want to use name of the checkpoint by itself, i.e. --checkpoint=/path/to/model.ckpt-30000
, rather than --checkpoint=/path/to/model.ckpt-30000.data-00000-of-00001
. TensorFlow automatically picks up the data
, index
and meta
checkpoint files tied to that name.
Tried the suggestion and it fixed this particular problem. Still not running to completion but will open another issue if necessary. Thank you so much for the quick response
I copied the whole project to my google drive, mounted the drive on google colab and ran the program according to the read me file. Training went fine and produced 30000 checkpoint files, so I assume it went OK with the following cell: !python3 experiment.py --data_dir="/content/drive/My Drive/Colab Notebooks/MNISTForColab" --summary_dir="/content/drive/My Drive/Colab Notebooks/MNISTForColab" --max_steps=30000 --dataset=mnist --batch_size=128 --shift=2
Next I ran the following command: !python3 experiment.py --data_dir="/content/drive/My Drive/Colab Notebooks/MNISTForColab" --train=False --checkpoint="/content/drive/My Drive/Colab Notebooks/MNISTForColab/summary_190106/190106-1413/train/model.ckpt-30000.data-00000-of-00001" --summary_dir="/content/drive/My Drive/Colab Notebooks/MNISTForColab" --eval_set=test --eval_size=80000 --eval_shard=0 And received the following error: 2019-01-07 15:59:26.726068: W tensorflow/core/util/tensor_slice_reader.cc:95] Could not open /content/drive/My Drive/Colab Notebooks/MNISTForColab/summary_190106/190106-1413/train/model.ckpt-30000.data-00000-of-00001: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator? 2019-01-07 15:59:26.730417: W tensorflow/core/util/tensor_slice_reader.cc:95] Could not open /content/drive/My Drive/Colab Notebooks/MNISTForColab/summary_190106/190106-1413/train/model.ckpt-30000.data-00000-of-00001: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator? 2019-01-07 15:59:26.730528: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at save_restore_tensor.cc:175 : Data loss: Unable to open table file /content/drive/My Drive/Colab Notebooks/MNISTForColab/summary_190106/190106-1413/train/model.ckpt-30000.data-00000-of-00001: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator? Would be great if you can shed some light on the source of the problem. I would love to contribute the notebooks later for public use. You may send replies directly to or.bennatan@gmail.com . Thank you.