Error during testing data.

7017227 commented 6 years ago

I tried to test trained lsp datasets during training

wonjinlee@alpha:~/deeppose/out/lsp_alexnet_imagenet_small$ ls checkpoint events.out.tfevents.1510238719.alpha checkpoint-100000.data-00000-of-00001 params.dump_171108_222950.txt checkpoint-100000.index params.dump_171108_223930.txt checkpoint-100000.meta params.dump_171108_224108.txt checkpoint-110000.data-00000-of-00001 params.dump_171108_224641.txt checkpoint-110000.index params.dump_171109_002231.txt checkpoint-110000.meta params.dump_171109_020558.txt checkpoint-120000.data-00000-of-00001 params.dump_171109_034216.txt checkpoint-120000.index params.dump_171109_043955.txt checkpoint-120000.meta params.dump_171109_060922.txt checkpoint-130000.data-00000-of-00001 params.dump_171109_061701.txt checkpoint-130000.index params.dump_171109_145127.txt checkpoint-130000.meta params.dump_171109_145344.txt checkpoint-90000.data-00000-of-00001 params.dump_171109_145635.txt checkpoint-90000.index params.dump_171109_170839.txt checkpoint-90000.meta params.dump_171109_234514.txt

But it shows this kind of error.

2017-11-10 17:42:08.970095: W tensorflow/core/framework/op_kernel.cc:1192] Data loss: Unable to open table file out/lsp_alexnet_imagenet_small/checkpoint: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator? Traceback (most recent call last): File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1327, in _do_call return fn(*args) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1306, in _run_fn status, run_metadata) File "/usr/lib/python3.5/contextlib.py", line 66, in exit next(self.gen) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status pywrap_tensorflow.TF_GetCode(status)) tensorflow.python.framework.errors_impl.DataLossError: Unable to open table file out/lsp_alexnet_imagenet_small/checkpoint: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator? [[Node: save/RestoreV2_5 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_arg_save/Const_0_0, save/RestoreV2_5/tensor_names, save/RestoreV2_5/shape_and_slices)]] [[Node: save/RestoreV2/_37 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_74_save/RestoreV2", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "tests/test_snapshot.py", line 116, in main(dataset_name, snapshot_path) File "tests/test_snapshot.py", line 79, in main test_net(test_dataset, test_iterator, dataset_name, snapshot_path) File "tests/test_snapshot.py", line 92, in test_net gpu_memory_fraction=0.32) # Set how much GPU memory to reserve for the network File "/home/wonjinlee/deeppose/scripts/regressionnet.py", line 94, in create_regression_net saver.restore(net.sess, init_snapshot_path) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 1560, in restore {self.saver_def.filename_tensor_name: save_path}) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 895, in run run_metadata_ptr) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1124, in _run feed_dict_tensor, options, run_metadata) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1321, in _do_run options, run_metadata) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1340, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.DataLossError: Unable to open table file out/lsp_alexnet_imagenet_small/checkpoint: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator? [[Node: save/RestoreV2_5 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_arg_save/Const_0_0, save/RestoreV2_5/tensor_names, save/RestoreV2_5/shape_and_slices)]] [[Node: save/RestoreV2/_37 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_74_save/RestoreV2", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]

Caused by op 'save/RestoreV2_5', defined at: File "tests/test_snapshot.py", line 116, in main(dataset_name, snapshot_path) File "tests/test_snapshot.py", line 79, in main test_net(test_dataset, test_iterator, dataset_name, snapshot_path) File "tests/test_snapshot.py", line 92, in test_net gpu_memory_fraction=0.32) # Set how much GPU memory to reserve for the network File "/home/wonjinlee/deeppose/scripts/regressionnet.py", line 93, in create_regression_net saver = tf.train.Saver() File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 1140, in init self.build() File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 1172, in build filename=self._filename) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 688, in build restore_sequentially, reshape) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 407, in _AddRestoreOps tensors = self.restore_op(filename_tensor, saveable, preferred_shard) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 247, in restore_op [spec.tensor.dtype])[0]) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/gen_io_ops.py", line 663, in restore_v2 dtypes=dtypes, name=name) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op op_def=op_def) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 2630, in create_op original_op=self._default_original_op, op_def=op_def) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 1204, in init self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

DataLossError (see above for traceback): Unable to open table file out/lsp_alexnet_imagenet_small/checkpoint: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator? [[Node: save/RestoreV2_5 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_arg_save/Const_0_0, save/RestoreV2_5/tensor_names, save/RestoreV2_5/shape_and_slices)]] [[Node: save/RestoreV2/_37 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_74_save/RestoreV2", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]

Why this kind of error happens? Test doesn't work while training? How can I resolve this error?

7017227 commented 6 years ago

When I tesst 'checkpoint' it shows dataloss error and when I test .meta or. data , it shows

NotFoundError (see above for traceback): Unsuccessful TensorSliceReader constructor: Failed to find any matching files for out/lsp_alexnet_imagenet_small/checkpoint-100000.data [[Node: save/RestoreV2_3 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_arg_save/Const_0_0, save/RestoreV2_3/tensor_names, save/RestoreV2_3/shape_and_slices)]] [[Node: save/RestoreV2/_37 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_74_save/RestoreV2", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]

this error

asanakoy commented 6 years ago

I will have to check this. Now I'm a bit busy with other projects. Try to change the tf.Saver version.

7017227 commented 6 years ago

How to change tf.Saver version?

asanakoy / deeppose_tf

Error during testing data. #12