ValueError: The passed save_path is not a valid checkpoint: model/crnn_syn90k_kr92000/shadownet.ckpt

kspook commented 5 years ago

@MaybeShewill-CV, I have an error after training Korean. In the last version, I made it for exporting as you know. But I had an error in the latest version with Eldon's modification

According to #178, there occurred in the previous version. But he didn't mention the solution in #178

Do you know the solution?

bash tfserve/export_crnn_saved_model.sh 
++ pwd
+ PYTHONPATH=/data/home/kspook/CRNN_Tensorflow
+ python tfserve/export_saved_model.py --export_dir model/crnn_syn90k_saved_model_kr --ckpt_path model/crnn_syn90k_kr92000/shadownet.ckpt
2019-07-11 08:35:35.544136: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-07-11 08:35:35.643497: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties: 
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
pciBusID: acd9:00:00.0
totalMemory: 11.17GiB freeMemory: 3.16GiB
2019-07-11 08:35:35.643547: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2019-07-11 08:35:36.075044: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-07-11 08:35:36.075094: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0 
2019-07-11 08:35:36.075108: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N 
2019-07-11 08:35:36.075285: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10295 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: acd9:00:00.0, compute capability: 3.7)
Traceback (most recent call last):
  File "tfserve/export_saved_model.py", line 136, in <module>
    build_saved_model(args.ckpt_path, args.export_dir)
  File "tfserve/export_saved_model.py", line 93, in build_saved_model
    saver.restore(sess=sess, save_path=ckpt_path)
  File "/home/kspook/.local/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 1538, in restore
    + compat.as_text(save_path))
ValueError: The passed save_path is not a valid checkpoint: model/crnn_syn90k_kr92000/shadownet.ckpt

MaybeShewill-CV commented 5 years ago

@kspook I think the error msg is pretty clear here. Please check your checkpoint file:)

kspook commented 5 years ago

In case of this, https://github.com/mrharicot/monodepth/issues/51#issuecomment-457345463, the absolute path was the solution .

But it doesn't solve for me.

kspook commented 5 years ago

According to this https://github.com/tensorflow/tensorflow/issues/22443#issuecomment-426462811 , this error means that the checkfile is not absent. And the writer recommends to check save().

kspook commented 5 years ago

According to this, https://blog.csdn.net/MachineRandy/article/details/79624010, the error means that variable aren't defined well.

Browse online related discussions, try restarting the kernel (Spyder editor) and changing tf.train.Saver(write_version=tf.train.SaverDef.V1) to restore the V1 version.

Reason : The 
real reason is that the code I wrote was saved and loaded before and after, defined twice before and after.

W = tf.Variable(xxx,name="weight")
1
Equivalent to creating a variable with name = "weight" twice in the stack of the TensorFlow graph, the actual name of the second (nth) will become "weight_1" ("weight_n-1"), then we will save the checkpoint The actual search for the "weight_n-1" variable instead of "weight" will cause an error.

Solution: 
(1) During the loading process, define the variable with the same name and 
tf.reset_default_graph() to clear the stack of the default graph, and set the global graph as the default graph. 
(2) Under normal scene, the model will not be saved. Load (or load in the same program), this will not happen, or restart kernel (Spyder) after saving, and then load the parameters.
---------------------

MaybeShewill-CV commented 5 years ago

@kspook How can you make sure there is nothing wrong with your checkpoint file. Have you tested if you can successfully restore the model from the checkpoint file?

kspook commented 5 years ago

@MaybeShewill-CV, thank you for your reply. I didn't mean you are wrong. I just check which I need to to.

I did test. but test has the same error.


python tools/test_shadownet.py --image_path data/test_images/test_01.jpg --weights_path model/crnn_syn90k/shadownet.ckpt  --char_dict_path ./data/char_dict/char_dict.json --ord_map_dict_path ./data/char_dict/ord_map.json 
2019-07-11 11:22:25.060100: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-07-11 11:22:30.269278: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties: 
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
pciBusID: acd9:00:00.0
totalMemory: 11.17GiB freeMemory: 11.10GiB
2019-07-11 11:22:30.269327: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2019-07-11 11:22:30.561576: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-07-11 11:22:30.561637: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0 
2019-07-11 11:22:30.561653: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N 
2019-07-11 11:22:30.561908: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6863 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: acd9:00:00.0, compute capability: 3.7)
Traceback (most recent call last):
  File "tools/test_shadownet.py", line 161, in <module>
    is_vis=args.visualize
  File "tools/test_shadownet.py", line 126, in recognize
    saver.restore(sess=sess, save_path=weights_path)
  File "/home/kspook/.local/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 1538, in restore
    + compat.as_text(save_path))
ValueError: The passed save_path is not a valid checkpoint: model/crnn_syn90k/shadownet.ckpt

MaybeShewill-CV commented 5 years ago

@kspook 1.Perhaps your checkpoint file path is wrong. 2.The checkpoint file was wrongly saved.

kspook commented 5 years ago

Test runs ok with below checkpoint name.

python tfserve/export_saved_model.py --image_path data/test_images/test_01.jpg --weights_path model/crnn_syn90k/shadownet_2019-07-11-09-41-05.ckpt-4000 --char_dict_path ./data/char_dict/char_dict.json --ord_map_dict_path ./data/char_dict/ord_map.json So, I don't use bash shell script. but doesn't work. python can't identify - in the path name. I am trying to solve the problem

python tfserve/export_saved_model.py --weights_path model/crnn_syn90k/shadownet_2019-07-11-09-41-05.ckpt-4000 --char_dict_path ./data/char_dict/char_dict.json --ord_map_dict_path ./data/char_dict/ord_map.json

MaybeShewill-CV commented 5 years ago

@kspook You got the error when you write the wrong checkpoint file path. Python can identify the file path named like shadownet_2019-07-11-09-41-05.ckpt-4000

kspook commented 5 years ago

1.bash


 #!/usr/bin/env bash
 # author: github.com/eldon

set -eux

PYTHONPATH=$(pwd) python tfserve/export_saved_model.py \
    --export_dir model/crnn_saved_model_kr \
    --ckpt_path  model/crnn_syn90k/shadownet_2019-07-11-09-41-05.ckpt-4000 \
    --char_dict_path data/char_dict/char_dict.json \
    --ord_map_dict_path data/char_dict/ord_map.json

rm -rf /tmp/crnn/1
mkdir -p /tmp/crnn/1
mv -f model/crnn_saved_model_kr/* /tmp/crnn/1
#mv -f model/crnn_syn90k_saved_model_kr/* /tmp/crnn/1

1.1 error

bash  tfserve/export_crnn_saved_model.sh 
++ pwd
+ PYTHONPATH=/data/home/kspook/CRNN_Tensorflow
+ python tfserve/export_saved_model.py --export_dir model/crnn_saved_model_kr --ckpt_path model/crnn_syn90k/shadownet_2019-07-11-09-41-05.ckpt-4000
main args :  model/crnn_syn90k/shadownet_2019-07-11-09-41-05.ckpt-4000
ops :  model/crnn_syn90k
2019-07-11 22:02:01.639882: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-07-11 22:02:03.718520: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties: 
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
pciBusID: b956:00:00.0
totalMemory: 11.92GiB freeMemory: 11.85GiB
2019-07-11 22:02:03.718569: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2019-07-11 22:02:12.217222: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-07-11 22:02:12.217268: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0 
2019-07-11 22:02:12.217282: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N 
2019-07-11 22:02:12.217564: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10984 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: b956:00:00.0, compute capability: 3.7)
ckpt_path :  model/crnn_syn90k/shadownet_2019-07-11-09-41-05.ckpt-4000
+ --char_dict_path data/char_dict/char_dict.json --ord_map_dict_path data/char_dict/ord_map.json
tfserve/export_crnn_saved_model.sh: line 12: --char_dict_path: command not found
(crnntf) kspook@MLGPU011:/data/home/kspook/CRNN_Tensorflow$

script : python tfserve/export_saved_model.py --export_dir model/crnn_saved_model_kr --ckpt_path model/crnn_syn90k/shadownet_2019-07-11-09-41-05.ckpt-4000 --char_dict_path data/char_dict/char_dict.json --ord_map_dict_path data/char_dict/ord_map.json

2.1 no response


python tfserve/export_saved_model.py --export_dir model/crnn_saved_model_kr --ckpt_path model/crnn_syn90k/shadownet_2019-07-11-09-41-05.ckpt-4000 --char_dict_path data/char_dict/char_dict.json --ord_map_dict_path data/char_dict/ord_map.json
main args :  model/crnn_syn90k/shadownet_2019-07-11-09-41-05.ckpt-4000
ops :  model/crnn_syn90k
2019-07-11 22:06:07.699451: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-07-11 22:06:09.564509: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties: 
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
pciBusID: b956:00:00.0
totalMemory: 11.92GiB freeMemory: 11.85GiB
2019-07-11 22:06:09.564556: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2019-07-11 22:06:09.856700: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-07-11 22:06:09.856750: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0 
2019-07-11 22:06:09.856764: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N 
2019-07-11 22:06:09.857028: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10984 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: b956:00:00.0, compute capability: 3.7)
ckpt_path :  model/crnn_syn90k/shadownet_2019-07-11-09-41-05.ckpt-4000

MaybeShewill-CV commented 5 years ago

@kspook Everything works fine in my local machine:)

MaybeShewill-CV / CRNN_Tensorflow

ValueError: The passed save_path is not a valid checkpoint: model/crnn_syn90k_kr92000/shadownet.ckpt #311