davidsandberg / facenet

Face recognition using Tensorflow
MIT License
13.81k stars 4.81k forks source link

pre-trained model unable to load because of tensorflow saver format :( #251

Closed tbarnier closed 7 years ago

tbarnier commented 7 years ago

When I try to load the model (with tensorflow 1.0.0 or tensorflow 1.0.1, I'm getting the following message

DataLossError (see above for traceback): Unable to open table file ../DATAS/20170216-091149/model-20170216-091149.ckpt: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator? [[Node: save/RestoreV2_278 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/RestoreV2_278/tensor_names, save/RestoreV2_278/shape_and_slices)]]

My configuration:

Ubuntu Linux Python 3.5 Tensorflow 1.0.0 and 1.0.1 (tried in conda separate envs)

Is there another check to make (protobuf version?

fabriciosantana commented 7 years ago

How can I train a model setting the --pretrained_model parameter?

Same issue here. I am using floydhub to run my training with the flowing command.

floyd run --gpu --env tensorflow-1.0:py2 --data uGo93wTLTC6yyKb7c4W7Ri 'python facenet/src/facenet_train_classifier.py --data_dir /input/senadores --logs_base_dir /output/facenet_logs --models_base_dir /output/facenet_models --pretrained_model /input/20170216-091149/model.ckpt'

Before running I have uploaded my dataset and the pretrained model. For this I have used the floyd init and floyd upload commands.

Also, I have tried training with the original model name (model-20170216-091149.ckpt-250000.data-00000-of-00001), as well as, after renaming it to model.cpkt, just in case.

My training finish sucessful if I don´t set the --pretrained_model parameter. I mean, the call bellow works just fine:

floyd run --gpu --env tensorflow-1.0:py2 --data uGo93wTLTC6yyKb7c4W7Ri 'python facenet/src/facenet_train_classifier.py --data_dir /input/senadores --logs_base_dir /output/facenet_logs --models_base_dir /output/facenet_models'

The only difference between the two training call is that in the second one I didn´t set the --pretrained_model parameter.

Here is the error:

INFO - W tensorflow/core/framework/op_kernel.cc:993] Data loss: Unable to open table file /input/20170216-091149/model.ckpt: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator?

INFO - DataLossError (see above for traceback): Unable to open table file /input/20170216-091149/model.ckpt: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator?

INFO - tensorflow.python.framework.errors_impl.DataLossError: Unable to open table file /input/20170216-091149/model.ckpt: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator? 2017-04-28 11:13:00,100 INFO - [[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]] 2017-04-28 11:13:00,101 INFO - [[Node: save/RestoreV2_86/_575 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_968_save/RestoreV2_86", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]

So, How can I train a model setting the --pretrained_model parameter?

ShownX commented 7 years ago

confirmed with v1.1.0

davidsandberg commented 7 years ago

Hi, I'm not sure which file you are trying to load. I recently tried loading a pretrained model with train_tripletloss.py and worked out fine. Then I used --pretrained_model ~/models/20170214-092102/model-20170214-092102.ckpt-80000 But it should be noted that there is no file named model-20170214-092102.ckpt-80000 in the 20170214-092102 directory, so Tensorflow adds .data-00000-of-00001 before restoring. These are the files in the directory:

-rwxrwxrwx 1 root root 96689276 feb 14 18:49 model-20170214-092102.ckpt-80000.data-00000-of-00001
-rwxrwxrwx 1 root root    22478 feb 14 18:49 model-20170214-092102.ckpt-80000.index
-rwxrwxrwx 1 root root 19991968 feb 14 09:29 model-20170214-092102.meta
xmuszq commented 7 years ago

@davidsandberg Hi David, When I'm trying to load the pretrained model, I found there are 3 sets of similar data. I'm confused which one I SHOULD use to load? Can you give me some idea??

image

Many thanks.

EdwardDixon commented 6 years ago

If you look at the filenames, I believe that you are seeing checkpoints - the model state at different points during training. Choosing the latest is probably a good idea - so model-20171104-092733.ckpt-151000.

StepOITD commented 6 years ago

Hello, I'm using the 20180402-114759 model, and I got the same issue. I'm sure that my path is right --pretrained_model /home/constantine/DataDisk/models/facenet/20180402-114759/model-20180402-114759.ckpt-275.data-00000-of-00001 and I have that DataLossError (see above for traceback): Unable to open table file /home/constantine/DataDisk/models/facenet/20180402-114759/model-20180402-114759.ckpt-275.data-00000-of-00001: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator? [[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]] [[Node: save/RestoreV2/_907 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_306_save/RestoreV2", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]] Anyone know how to fix this?

hamhochoi commented 6 years ago

@StepOITD Did you find the solution for this issue? I got the same error when I try to export my trained model following script in here.

StepOITD commented 6 years ago

found a modified version in pr,worked

sanjayanayak commented 6 years ago

Hi @StepOITD I am getting the same error please help me to figure it out.

anubhav0fnu commented 6 years ago

found a modified version in pr,worked

@StepOITD , Could you please post the solution on how you tackled the issue. I am getting the same error when I have all the models available for multiple checkpoints in the same directory. !

ndinhtuan commented 5 years ago

@anubhav0fnu I have tried with model version 2018, and restore model-20180408-102900.ckpt-90.data-00000-of-00001. I just add argument: --pretrain_model model-20180408-102900.ckpt-90, It works for me. I read some topic on stackoverflow, that said restore function of tf.Saver is add "data-00000-of-00001" in last path of model automatically.

KanchanIIT commented 5 years ago

as @theiron97 said don't need to append the "data-00000-of-00001" in the file name. just provide the filename without that part.

Choapinus commented 5 years ago

@anubhav0fnu I have tried with model version 2018, and restore model-20180408-102900.ckpt-90.data-00000-of-00001. I just add argument: --pretrain_model model-20180408-102900.ckpt-90, It works for me. I read some topic on stackoverflow, that said restore function of tf.Saver is add "data-00000-of-00001" in last path of model automatically.

I love you dude <3 It worked for me

JinyuJinyuJinyu commented 3 years ago

@anubhav0fnu I have tried with model version 2018, and restore model-20180408-102900.ckpt-90.data-00000-of-00001. I just add argument: --pretrain_model model-20180408-102900.ckpt-90, It works for me. I read some topic on stackoverflow, that said restore function of tf.Saver is add "data-00000-of-00001" in last path of model automatically.

Helped me !!! Thank yal