Closed kspook closed 5 years ago
@kspook I think the error msg is pretty clear here. Please check your checkpoint file:)
In case of this, https://github.com/mrharicot/monodepth/issues/51#issuecomment-457345463, the absolute path was the solution .
But it doesn't solve for me.
According to this https://github.com/tensorflow/tensorflow/issues/22443#issuecomment-426462811 , this error means that the checkfile is not absent. And the writer recommends to check save().
According to this, https://blog.csdn.net/MachineRandy/article/details/79624010, the error means that variable aren't defined well.
Browse online related discussions, try restarting the kernel (Spyder editor) and changing tf.train.Saver(write_version=tf.train.SaverDef.V1) to restore the V1 version.
Reason : The
real reason is that the code I wrote was saved and loaded before and after, defined twice before and after.
W = tf.Variable(xxx,name="weight")
1
Equivalent to creating a variable with name = "weight" twice in the stack of the TensorFlow graph, the actual name of the second (nth) will become "weight_1" ("weight_n-1"), then we will save the checkpoint The actual search for the "weight_n-1" variable instead of "weight" will cause an error.
Solution:
(1) During the loading process, define the variable with the same name and
tf.reset_default_graph() to clear the stack of the default graph, and set the global graph as the default graph.
(2) Under normal scene, the model will not be saved. Load (or load in the same program), this will not happen, or restart kernel (Spyder) after saving, and then load the parameters.
---------------------
@kspook How can you make sure there is nothing wrong with your checkpoint file. Have you tested if you can successfully restore the model from the checkpoint file?
@MaybeShewill-CV, thank you for your reply. I didn't mean you are wrong. I just check which I need to to.
I did test. but test has the same error.
python tools/test_shadownet.py --image_path data/test_images/test_01.jpg --weights_path model/crnn_syn90k/shadownet.ckpt --char_dict_path ./data/char_dict/char_dict.json --ord_map_dict_path ./data/char_dict/ord_map.json
2019-07-11 11:22:25.060100: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-07-11 11:22:30.269278: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties:
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
pciBusID: acd9:00:00.0
totalMemory: 11.17GiB freeMemory: 11.10GiB
2019-07-11 11:22:30.269327: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2019-07-11 11:22:30.561576: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-07-11 11:22:30.561637: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0
2019-07-11 11:22:30.561653: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N
2019-07-11 11:22:30.561908: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6863 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: acd9:00:00.0, compute capability: 3.7)
Traceback (most recent call last):
File "tools/test_shadownet.py", line 161, in <module>
is_vis=args.visualize
File "tools/test_shadownet.py", line 126, in recognize
saver.restore(sess=sess, save_path=weights_path)
File "/home/kspook/.local/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 1538, in restore
+ compat.as_text(save_path))
ValueError: The passed save_path is not a valid checkpoint: model/crnn_syn90k/shadownet.ckpt
@kspook 1.Perhaps your checkpoint file path is wrong. 2.The checkpoint file was wrongly saved.
Test runs ok with below checkpoint name.
python tfserve/export_saved_model.py --image_path data/test_images/test_01.jpg --weights_path model/crnn_syn90k/shadownet_2019-07-11-09-41-05.ckpt-4000 --char_dict_path ./data/char_dict/char_dict.json --ord_map_dict_path ./data/char_dict/ord_map.json
So, I don't use bash shell script. but doesn't work.
python can't identify - in the path name.
I am trying to solve the problem
python tfserve/export_saved_model.py --weights_path model/crnn_syn90k/shadownet_2019-07-11-09-41-05.ckpt-4000 --char_dict_path ./data/char_dict/char_dict.json --ord_map_dict_path ./data/char_dict/ord_map.json
@kspook You got the error when you write the wrong checkpoint file path. Python can identify the file path named like shadownet_2019-07-11-09-41-05.ckpt-4000
1.bash
#!/usr/bin/env bash
# author: github.com/eldon
set -eux
PYTHONPATH=$(pwd) python tfserve/export_saved_model.py \
--export_dir model/crnn_saved_model_kr \
--ckpt_path model/crnn_syn90k/shadownet_2019-07-11-09-41-05.ckpt-4000 \
--char_dict_path data/char_dict/char_dict.json \
--ord_map_dict_path data/char_dict/ord_map.json
rm -rf /tmp/crnn/1
mkdir -p /tmp/crnn/1
mv -f model/crnn_saved_model_kr/* /tmp/crnn/1
#mv -f model/crnn_syn90k_saved_model_kr/* /tmp/crnn/1
1.1 error
bash tfserve/export_crnn_saved_model.sh
++ pwd
+ PYTHONPATH=/data/home/kspook/CRNN_Tensorflow
+ python tfserve/export_saved_model.py --export_dir model/crnn_saved_model_kr --ckpt_path model/crnn_syn90k/shadownet_2019-07-11-09-41-05.ckpt-4000
main args : model/crnn_syn90k/shadownet_2019-07-11-09-41-05.ckpt-4000
ops : model/crnn_syn90k
2019-07-11 22:02:01.639882: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-07-11 22:02:03.718520: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties:
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
pciBusID: b956:00:00.0
totalMemory: 11.92GiB freeMemory: 11.85GiB
2019-07-11 22:02:03.718569: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2019-07-11 22:02:12.217222: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-07-11 22:02:12.217268: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0
2019-07-11 22:02:12.217282: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N
2019-07-11 22:02:12.217564: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10984 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: b956:00:00.0, compute capability: 3.7)
ckpt_path : model/crnn_syn90k/shadownet_2019-07-11-09-41-05.ckpt-4000
+ --char_dict_path data/char_dict/char_dict.json --ord_map_dict_path data/char_dict/ord_map.json
tfserve/export_crnn_saved_model.sh: line 12: --char_dict_path: command not found
(crnntf) kspook@MLGPU011:/data/home/kspook/CRNN_Tensorflow$
2.1 no response
python tfserve/export_saved_model.py --export_dir model/crnn_saved_model_kr --ckpt_path model/crnn_syn90k/shadownet_2019-07-11-09-41-05.ckpt-4000 --char_dict_path data/char_dict/char_dict.json --ord_map_dict_path data/char_dict/ord_map.json
main args : model/crnn_syn90k/shadownet_2019-07-11-09-41-05.ckpt-4000
ops : model/crnn_syn90k
2019-07-11 22:06:07.699451: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-07-11 22:06:09.564509: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties:
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
pciBusID: b956:00:00.0
totalMemory: 11.92GiB freeMemory: 11.85GiB
2019-07-11 22:06:09.564556: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2019-07-11 22:06:09.856700: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-07-11 22:06:09.856750: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0
2019-07-11 22:06:09.856764: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N
2019-07-11 22:06:09.857028: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10984 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: b956:00:00.0, compute capability: 3.7)
ckpt_path : model/crnn_syn90k/shadownet_2019-07-11-09-41-05.ckpt-4000
@kspook Everything works fine in my local machine:)
@MaybeShewill-CV, I have an error after training Korean. In the last version, I made it for exporting as you know. But I had an error in the latest version with Eldon's modification
According to #178, there occurred in the previous version. But he didn't mention the solution in #178
Do you know the solution?