Closed tbarnier closed 7 years ago
Same issue here. I am using floydhub to run my training with the flowing command.
floyd run --gpu --env tensorflow-1.0:py2 --data uGo93wTLTC6yyKb7c4W7Ri 'python facenet/src/facenet_train_classifier.py --data_dir /input/senadores --logs_base_dir /output/facenet_logs --models_base_dir /output/facenet_models --pretrained_model /input/20170216-091149/model.ckpt'
Before running I have uploaded my dataset and the pretrained model. For this I have used the floyd init and floyd upload commands.
Also, I have tried training with the original model name (model-20170216-091149.ckpt-250000.data-00000-of-00001), as well as, after renaming it to model.cpkt, just in case.
My training finish sucessful if I don´t set the --pretrained_model parameter. I mean, the call bellow works just fine:
floyd run --gpu --env tensorflow-1.0:py2 --data uGo93wTLTC6yyKb7c4W7Ri 'python facenet/src/facenet_train_classifier.py --data_dir /input/senadores --logs_base_dir /output/facenet_logs --models_base_dir /output/facenet_models'
The only difference between the two training call is that in the second one I didn´t set the --pretrained_model parameter.
Here is the error:
INFO - W tensorflow/core/framework/op_kernel.cc:993] Data loss: Unable to open table file /input/20170216-091149/model.ckpt: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator?
INFO - DataLossError (see above for traceback): Unable to open table file /input/20170216-091149/model.ckpt: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator?
INFO - tensorflow.python.framework.errors_impl.DataLossError: Unable to open table file /input/20170216-091149/model.ckpt: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator? 2017-04-28 11:13:00,100 INFO - [[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]] 2017-04-28 11:13:00,101 INFO - [[Node: save/RestoreV2_86/_575 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_968_save/RestoreV2_86", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]
So, How can I train a model setting the --pretrained_model parameter?
confirmed with v1.1.0
Hi,
I'm not sure which file you are trying to load. I recently tried loading a pretrained model with train_tripletloss.py and worked out fine. Then I used
--pretrained_model ~/models/20170214-092102/model-20170214-092102.ckpt-80000
But it should be noted that there is no file named model-20170214-092102.ckpt-80000
in the 20170214-092102
directory, so Tensorflow adds .data-00000-of-00001
before restoring. These are the files in the directory:
-rwxrwxrwx 1 root root 96689276 feb 14 18:49 model-20170214-092102.ckpt-80000.data-00000-of-00001
-rwxrwxrwx 1 root root 22478 feb 14 18:49 model-20170214-092102.ckpt-80000.index
-rwxrwxrwx 1 root root 19991968 feb 14 09:29 model-20170214-092102.meta
@davidsandberg Hi David, When I'm trying to load the pretrained model, I found there are 3 sets of similar data. I'm confused which one I SHOULD use to load? Can you give me some idea??
Many thanks.
If you look at the filenames, I believe that you are seeing checkpoints - the model state at different points during training. Choosing the latest is probably a good idea - so model-20171104-092733.ckpt-151000.
Hello, I'm using the 20180402-114759 model, and I got the same issue.
I'm sure that my path is right
--pretrained_model /home/constantine/DataDisk/models/facenet/20180402-114759/model-20180402-114759.ckpt-275.data-00000-of-00001
and I have that
DataLossError (see above for traceback): Unable to open table file /home/constantine/DataDisk/models/facenet/20180402-114759/model-20180402-114759.ckpt-275.data-00000-of-00001: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator? [[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]] [[Node: save/RestoreV2/_907 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_306_save/RestoreV2", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]
Anyone know how to fix this?
@StepOITD Did you find the solution for this issue? I got the same error when I try to export my trained model following script in here.
found a modified version in pr,worked
Hi @StepOITD I am getting the same error please help me to figure it out.
found a modified version in pr,worked
@StepOITD , Could you please post the solution on how you tackled the issue. I am getting the same error when I have all the models available for multiple checkpoints in the same directory. !
@anubhav0fnu I have tried with model version 2018, and restore model-20180408-102900.ckpt-90.data-00000-of-00001. I just add argument: --pretrain_model model-20180408-102900.ckpt-90, It works for me. I read some topic on stackoverflow, that said restore function of tf.Saver is add "data-00000-of-00001" in last path of model automatically.
as @theiron97 said don't need to append the "data-00000-of-00001" in the file name. just provide the filename without that part.
@anubhav0fnu I have tried with model version 2018, and restore model-20180408-102900.ckpt-90.data-00000-of-00001. I just add argument: --pretrain_model model-20180408-102900.ckpt-90, It works for me. I read some topic on stackoverflow, that said restore function of tf.Saver is add "data-00000-of-00001" in last path of model automatically.
I love you dude <3 It worked for me
@anubhav0fnu I have tried with model version 2018, and restore model-20180408-102900.ckpt-90.data-00000-of-00001. I just add argument: --pretrain_model model-20180408-102900.ckpt-90, It works for me. I read some topic on stackoverflow, that said restore function of tf.Saver is add "data-00000-of-00001" in last path of model automatically.
Helped me !!! Thank yal
When I try to load the model (with tensorflow 1.0.0 or tensorflow 1.0.1, I'm getting the following message
DataLossError (see above for traceback): Unable to open table file ../DATAS/20170216-091149/model-20170216-091149.ckpt: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator? [[Node: save/RestoreV2_278 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/RestoreV2_278/tensor_names, save/RestoreV2_278/shape_and_slices)]]
My configuration:
Ubuntu Linux Python 3.5 Tensorflow 1.0.0 and 1.0.1 (tried in conda separate envs)
Is there another check to make (protobuf version?