brightmart / text_classification

all kinds of text classification models and more with deep learning
MIT License
7.83k stars 2.57k forks source link

CNN Predict: Key b-1 not found in checkpoint #65

Open bikramkhastgir opened 6 years ago

bikramkhastgir commented 6 years ago

Hi @brightmart ,

I have trained the CNN using ''train-zhihu4-only-title-all.txt''. When i am using the predict file for prediction on "test-zhihu6-title-desc.txt" using the word2vec as "zhihu-word2vec-title-desc.bin-100", I am getting the following error:

Restoring Variables from Checkpoint 2018-06-27 20:49:22.480037: W tensorflow/core/framework/op_kernel.cc:1318] OP_REQUIRES failed at save_restore_v2_ops.cc:184 : Not found: Key b-1 not found in checkpoint Traceback (most recent call last): File "/home/user/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1322, in _do_call return fn(*args) File "/home/user/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1307, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File "/home/user/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1409, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.NotFoundError: Key b-1 not found in checkpoint [[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_FLOAT, DT_INT32, DT_INT32, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "p7_TextCNN_predict.py", line 77, in saver.restore(sess, tf.train.latest_checkpoint(FLAGS.ckpt_dir)) File "/home/user/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1802, in restore {self.saver_def.filename_tensor_name: save_path}) File "/home/user/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 900, in run run_metadata_ptr) File "/home/user/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1135, in _run feed_dict_tensor, options, run_metadata) File "/home/user/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1316, in _do_run run_metadata) File "/home/user/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1335, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.NotFoundError: Key b-1 not found in checkpoint [[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_FLOAT, DT_INT32, DT_INT32, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

Caused by op 'save/RestoreV2', defined at: File "p7_TextCNN_predict.py", line 74, in saver = tf.train.Saver() File "/home/user/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1338, in init self.build() File "/home/user/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1347, in build self._build(self._filename, build_save=True, build_restore=True) File "/home/user/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1384, in _build build_save=build_save, build_restore=build_restore) File "/home/user/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 835, in _build_internal restore_sequentially, reshape) File "/home/user/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 472, in _AddRestoreOps restore_sequentially) File "/home/user/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 886, in bulk_restore return io_ops.restore_v2(filename_tensor, names, slices, dtypes) File "/home/user/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/gen_io_ops.py", line 1463, in restore_v2 shape_and_slices=shape_and_slices, dtypes=dtypes, name=name) File "/home/user/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper op_def=op_def) File "/home/user/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3392, in create_op op_def=op_def) File "/home/user/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1718, in init self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

NotFoundError (see above for traceback): Key b-1 not found in checkpoint [[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_FLOAT, DT_INT32, DT_INT32, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]


Python: 2.7 ... Can you help me figure it out as there is no b-1 key in checkpoint?

Thank you..

brightmart commented 6 years ago

Hi, error come from: UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte

try another way to use pretrain word embedding: import gensim from gensim.models import KeyedVectors word2vec_model = KeyedVectors.load_word2vec_format(word2vec_model_path, binary=True, unicode_errors='ignore')

or set use pretrain word embedding flag to false.


发件人: IronMelter notifications@github.com 发送时间: 2018年6月27日 23:14:43 收件人: brightmart/text_classification 抄送: brightmart; Mention 主题: [brightmart/text_classification] CNN Predict: Key b-1 not found in checkpoint (#65)

Hi @brightmarthttps://github.com/brightmart ,

I have trained the CNN using ''train-zhihu4-only-title-all.txt''. When i am using the predict file for prediction on "test-zhihu6-title-desc.txt" using the word2vec as "zhihu-word2vec-title-desc.bin-100", I am getting the following error:

Traceback (most recent call last): File "/home/user/bikram/temp/data_util_zhihu.py", line 27, in create_vocabulary vocabulary_word2index, vocabulary_index2word=pickle.load(data_f) File "/home/user/anaconda3/lib/python3.6/codecs.py", line 321, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte Traceback (most recent call last): File "/home/user/bikram/temp/data_util_zhihu.py", line 69, in create_vocabulary_label vocabulary_word2index_label, vocabulary_index2word_label=pickle.load(data_f) File "/home/user/anaconda3/lib/python3.6/codecs.py", line 321, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte start padding.... end padding... Restoring Variables from Checkpoint 2018-06-27 20:40:24.135961: W tensorflow/core/framework/op_kernel.cc:1318] OP_REQUIRES failed at save_restore_v2_ops.cc:184 : Not found: Key b-1 not found in checkpoint Traceback (most recent call last): File "/home/user/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1322, in _do_call return fn(*args) File "/home/user/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1307, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File "/home/user/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1409, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.NotFoundError: Key b-1 not found in checkpoint [[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_FLOAT, DT_INT32, DT_INT32, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "p7_TextCNN_predict.py", line 77, in saver.restore(sess, tf.train.latest_checkpoint(FLAGS.ckpt_dir)) File "/home/user/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1802, in restore {self.saver_def.filename_tensor_name: save_path}) File "/home/user/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 900, in run run_metadata_ptr) File "/home/user/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1135, in _run feed_dict_tensor, options, run_metadata) File "/home/user/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1316, in _do_run run_metadata) File "/home/user/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1335, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.NotFoundError: Key b-1 not found in checkpoint [[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_FLOAT, DT_INT32, DT_INT32, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

Caused by op 'save/RestoreV2', defined at: File "p7_TextCNN_predict.py", line 74, in saver = tf.train.Saver() File "/home/user/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1338, in init self.build() File "/home/user/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1347, in build self._build(self._filename, build_save=True, build_restore=True) File "/home/user/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1384, in _build build_save=build_save, build_restore=build_restore) File "/home/user/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 835, in _build_internal restore_sequentially, reshape) File "/home/user/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 472, in _AddRestoreOps restore_sequentially) File "/home/user/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 886, in bulk_restore return io_ops.restore_v2(filename_tensor, names, slices, dtypes) File "/home/user/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/gen_io_ops.py", line 1463, in restore_v2 shape_and_slices=shape_and_slices, dtypes=dtypes, name=name) File "/home/user/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper op_def=op_def) File "/home/user/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3392, in create_op op_def=op_def) File "/home/user/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1718, in init self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

NotFoundError (see above for traceback): Key b-1 not found in checkpoint [[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_FLOAT, DT_INT32, DT_INT32, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]


Python: 2.7 ... Can you help me figure it out as there is no b-1 key in checkpoint?

Thank you..

― You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/brightmart/text_classification/issues/65, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ASuYMNHzd34vhVx9uRFOywADEH1a_Iqdks5uA6FjgaJpZM4U55yN.

bikramkhastgir commented 6 years ago

Hi,

The 'pretrain word embedding' flag is set to False. Also this was because it was reading from the vocab pickle file in 'r' and 'a' instead of 'rb' and 'ab'. I have made those changes. Now the error is just this much without the utf-8 error. Any suggestions?

Thanks for your time, Bikram

bikramkhastgir commented 6 years ago

The contents of the "checkpoint" file is :


model_checkpoint_path: "model.ckpt-9" all_model_checkpoint_paths: "model.ckpt-5" all_model_checkpoint_paths: "model.ckpt-6" all_model_checkpoint_paths: "model.ckpt-7" all_model_checkpoint_paths: "model.ckpt-8" all_model_checkpoint_paths: "model.ckpt-9"


That is all of it which is getting saved while training.

brightmart commented 6 years ago

do you still get same error?

bikramkhastgir commented 6 years ago

yes... will i upload the files and you can try to reproduce them in your system??

brightmart commented 6 years ago

ok.


发件人: IronMelter notifications@github.com 发送时间: 2018年6月28日 20:31 收件人: brightmart/text_classification 抄送: brightmart; Mention 主题: Re: [brightmart/text_classification] CNN Predict: Key b-1 not found in checkpoint (#65)

yes... will i upload the files and you can try to reproduce them in your system??

― You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/brightmart/text_classification/issues/65#issuecomment-401018239, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ASuYMFZdxmt4jgIIhJkSLFopyM4TG7Lxks5uBMyZgaJpZM4U55yN.

bikramkhastgir commented 6 years ago

Hi @brightmart ,

Thank you for your help. The uploaded files are in the URL:

{ https://anonfile.com/oa3ef0f3bb/data_util.py https://anonfile.com/p931f4fcb5/p7_TextCNN_predict.py https://anonfile.com/q13af0fcb9/p7_TextCNN_train.py https://anonfile.com/r033faf2b3/p8_TextRNN_model.py https://anonfile.com/s533f5f5b1/p7_TextCNN_model.py https://anonfile.com/tc3afbfbbe/data_util_zhihu.py https://anonfile.com/u53af4feb9/p8_TextRNN_train.py }

I have used data_util only in training for the CNN. The training files for both CNN and RNN is either 'train-zhihu4-only-title-all.txt' downloaded from Zhihu url or the 'sample_multiple_label.txt' from your repo.

The RNN is also throwing error while getting trained as key not found. Both the CNN and RNN have slightly different error. The CNN is giving error while predicting and the RNN while training.

Note: The use word embedding is set as False while training in CNN. I am using

Regards,

bikramkhastgir commented 6 years ago

i am using Tensorflow 1.8.0

kevinsay commented 6 years ago

@bikramkhastgir “Not found: Key b-1 not found in checkpoint”, i modify the cnn program for train single label,this error also appears when i predict,has this error been solved? i need you help.

bikramkhastgir commented 6 years ago

No @kevinsay ... I couldnt figure out exactly which routine needs Key b-1. So i am still hoping for @brightmart to figure it out..

switchhh commented 5 years ago

@bikramkhastgir Hi!Did you solve this problem? I met the same error when i predict...

switchhh commented 5 years ago

@bikramkhastgir Hi,I think I got the way to solve this problem.. the filter_nums array is different in train and predict, the b's name in model defined by b-%s, s is the filter_num....

bikramkhastgir commented 5 years ago

@switchhh Were you able to run it? I can see "num_filters" is 128 for both train and predict and "filter_sizes" = [6,7,8]

switchhh commented 5 years ago

@bikramkhastgir Yes, I can run it. the problem is filter_sizes is different in train and predict, sorry for writing wrong, in train it's[6,7,8],but in prediction is [1,2,3,4,5,6,7,8] in my edition, but i check it now, the bug was fixed, you can try it again. and sorry for my poor english..