Open asif3058 opened 5 years ago
Could you please check that you have a right version of Tensorflow installed for this?
I have used Tensorflow 1.10 here and updated the code for that version. It worked well for the training part. I'm not sure whether I can only run this model in Tensorflow 0.10 or it can be run in version 1.10 too.
Unfortunately, I don't think you can use those pre-trained models with the newer version of Tensorflow.
I tried to run the model for evaluation and got some error. The log is posted here:
Command: python document_summarizer_training_testing.py --use_gpu /gpu:2 --data_mode cnn --exp_mode test --model_to_load 2 --train_dir training/directory/cnn-reinforcementlearn-singlesample-from-moracle-noatt-sample5 --num_sample_rollout 5 > training/directory/cnn-reinforcementlearn-singlesample-from-moracle-noatt-sample5/test.model2.log
Error: Traceback (most recent call last): File "document_summarizer_training_testing.py", line 291, in
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "document_summarizer_training_testing.py", line 287, in main
test()
File "document_summarizer_training_testing.py", line 259, in test
model.saver.restore(sess, selected_modelpath)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 1562, in restore
err, "a Variable name or other graph key that is missing")
tensorflow.python.framework.errors_impl.NotFoundError: Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:
Tensor name "PolicyNetwork/ConvLayer/Conv1D_1/conv_biases_1" not found in checkpoint files training/directory/cnn-reinforcementlearn-singlesample-from-moracle-noatt-sample5/model.ckpt.epoch-2 [[node save/RestoreV2 (defined at /media/gtx/data/Asif/Refresh-master/my_model.py:73) = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]
Caused by op u'save/RestoreV2', defined at: File "document_summarizer_training_testing.py", line 291, in
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "document_summarizer_training_testing.py", line 287, in main
test()
File "document_summarizer_training_testing.py", line 244, in test
model = MY_Model(sess, len(vocab_dict)-2)
File "/media/gtx/data/Asif/Refresh-master/my_model.py", line 73, in init
self.saver = tf.train.Saver(tf.global_variables(), max_to_keep=None)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 1102, in init
self.build()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 1114, in build
self._build(self._filename, build_save=True, build_restore=True)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 1151, in _build
build_save=build_save, build_restore=build_restore)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 795, in _build_internal
restore_sequentially, reshape)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 406, in _AddRestoreOps
restore_sequentially)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 862, in bulk_restore
return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_io_ops.py", line 1466, in restore_v2
shape_and_slices=shape_and_slices, dtypes=dtypes, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
return func(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 3274, in create_op
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1770, in init
self._traceback = tf_stack.extract_stack()
NotFoundError (see above for traceback): Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:
Tensor name "PolicyNetwork/ConvLayer/Conv1D_1/conv_biases_1" not found in checkpoint files training/directory/cnn-reinforcementlearn-singlesample-from-moracle-noatt-sample5/model.ckpt.epoch-2 [[node save/RestoreV2 (defined at /media/gtx/data/Asif/Refresh-master/my_model.py:73) = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]
what could be the issue here?