EdinburghNLP / Refresh

Ranking Sentences for Extractive Summarization with Reinforcement Learning
BSD 3-Clause "New" or "Revised" License
272 stars 48 forks source link

Restoring from checkpoint failed in evaluation #18

Open asif3058 opened 5 years ago

asif3058 commented 5 years ago

I tried to run the model for evaluation and got some error. The log is posted here:

Command: python document_summarizer_training_testing.py --use_gpu /gpu:2 --data_mode cnn --exp_mode test --model_to_load 2 --train_dir training/directory/cnn-reinforcementlearn-singlesample-from-moracle-noatt-sample5 --num_sample_rollout 5 > training/directory/cnn-reinforcementlearn-singlesample-from-moracle-noatt-sample5/test.model2.log

Error: Traceback (most recent call last): File "document_summarizer_training_testing.py", line 291, in tf.app.run() File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 125, in run _sys.exit(main(argv)) File "document_summarizer_training_testing.py", line 287, in main test() File "document_summarizer_training_testing.py", line 259, in test model.saver.restore(sess, selected_modelpath) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 1562, in restore err, "a Variable name or other graph key that is missing") tensorflow.python.framework.errors_impl.NotFoundError: Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Tensor name "PolicyNetwork/ConvLayer/Conv1D_1/conv_biases_1" not found in checkpoint files training/directory/cnn-reinforcementlearn-singlesample-from-moracle-noatt-sample5/model.ckpt.epoch-2 [[node save/RestoreV2 (defined at /media/gtx/data/Asif/Refresh-master/my_model.py:73) = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

Caused by op u'save/RestoreV2', defined at: File "document_summarizer_training_testing.py", line 291, in tf.app.run() File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 125, in run _sys.exit(main(argv)) File "document_summarizer_training_testing.py", line 287, in main test() File "document_summarizer_training_testing.py", line 244, in test model = MY_Model(sess, len(vocab_dict)-2) File "/media/gtx/data/Asif/Refresh-master/my_model.py", line 73, in init self.saver = tf.train.Saver(tf.global_variables(), max_to_keep=None) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 1102, in init self.build() File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 1114, in build self._build(self._filename, build_save=True, build_restore=True) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 1151, in _build build_save=build_save, build_restore=build_restore) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 795, in _build_internal restore_sequentially, reshape) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 406, in _AddRestoreOps restore_sequentially) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 862, in bulk_restore return io_ops.restore_v2(filename_tensor, names, slices, dtypes) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_io_ops.py", line 1466, in restore_v2 shape_and_slices=shape_and_slices, dtypes=dtypes, name=name) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper op_def=op_def) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/util/deprecation.py", line 488, in new_func return func(*args, **kwargs) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 3274, in create_op op_def=op_def) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1770, in init self._traceback = tf_stack.extract_stack()

NotFoundError (see above for traceback): Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Tensor name "PolicyNetwork/ConvLayer/Conv1D_1/conv_biases_1" not found in checkpoint files training/directory/cnn-reinforcementlearn-singlesample-from-moracle-noatt-sample5/model.ckpt.epoch-2 [[node save/RestoreV2 (defined at /media/gtx/data/Asif/Refresh-master/my_model.py:73) = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

what could be the issue here?

shashiongithub commented 5 years ago

Could you please check that you have a right version of Tensorflow installed for this?

asif3058 commented 5 years ago

I have used Tensorflow 1.10 here and updated the code for that version. It worked well for the training part. I'm not sure whether I can only run this model in Tensorflow 0.10 or it can be run in version 1.10 too.

shashiongithub commented 5 years ago

Unfortunately, I don't think you can use those pre-trained models with the newer version of Tensorflow.