kernel not found in checkpoint

loretoparisi commented 6 years ago

I have installed TF 1.5.0 since the latest merge seems to work with TF 1.4 and 1.5, while I get this error when running g2p-seq2seq --interactive --model g2p-seq2seq-cmudict

Traceback (most recent call last):
  File "/usr/local/bin/g2p-seq2seq", line 11, in <module>
    load_entry_point('g2p-seq2seq==5.0.0a0', 'console_scripts', 'g2p-seq2seq')()
  File "build/bdist.linux-x86_64/egg/g2p_seq2seq/app.py", line 92, in main
  File "build/bdist.linux-x86_64/egg/g2p_seq2seq/g2p.py", line 106, in load_decode_model
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 1686, in restore
    {self.saver_def.filename_tensor_name: save_path})
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 895, in run
    run_metadata_ptr)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1128, in _run
    feed_dict_tensor, options, run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1344, in _do_run
    options, run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1363, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.NotFoundError: Key embedding_attention_seq2seq/rnn/embedding_wrapper/multi_rnn_cell/cell_0/gru_cell/gates/kernel not found in checkpoint
     [[Node: save_1/RestoreV2_17 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save_1/Const_0_0, save_1/RestoreV2_17/tensor_names, save_1/RestoreV2_17/shape_and_slices)]]

Caused by op u'save_1/RestoreV2_17', defined at:
  File "/usr/local/bin/g2p-seq2seq", line 11, in <module>
    load_entry_point('g2p-seq2seq==5.0.0a0', 'console_scripts', 'g2p-seq2seq')()
  File "build/bdist.linux-x86_64/egg/g2p_seq2seq/app.py", line 92, in main
    g2p_model.load_decode_model()
  File "build/bdist.linux-x86_64/egg/g2p_seq2seq/g2p.py", line 102, in load_decode_model
    self.model.saver = tf.train.Saver(tf.global_variables(), max_to_keep=1)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 1239, in __init__
    self.build()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 1248, in build
    self._build(self._filename, build_save=True, build_restore=True)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 1284, in _build
    build_save=build_save, build_restore=build_restore)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 765, in _build_internal
    restore_sequentially, reshape)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 428, in _AddRestoreOps
    tensors = self.restore_op(filename_tensor, saveable, preferred_shard)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 268, in restore_op
    [spec.tensor.dtype])[0])
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_io_ops.py", line 1031, in restore_v2
    shape_and_slices=shape_and_slices, dtypes=dtypes, name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 3160, in create_op
    op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1625, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

NotFoundError (see above for traceback): Key embedding_attention_seq2seq/rnn/embedding_wrapper/multi_rnn_cell/cell_0/gru_cell/gates/kernel not found in checkpoint
     [[Node: save_1/RestoreV2_17 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save_1/Const_0_0, save_1/RestoreV2_17/tensor_names, save_1/RestoreV2_17/shape_and_slices)]]

I have the model installed locally:

root@57901d9107da:~# ls -l g2p-seq2seq-cmudict/
total 30076
-rw-r--r-- 1 1000 1000       67 Mar  7  2017 checkpoint
-rw-r--r-- 1 1000 1000 30774852 Mar  7  2017 model.data-00000-of-00001
-rw-r--r-- 1 1000 1000     1529 Mar  7  2017 model.index
-rw-r--r-- 1 1000 1000       21 Mar  7  2017 model.params
-rw-r--r-- 1 1000 1000       73 Mar  7  2017 vocab.grapheme
-rw-r--r-- 1 1000 1000      120 Mar  7  2017 vocab.phoneme

loretoparisi commented 6 years ago

[UPDATE]

It seems now that the pre-trained model is broken for TF 1.1 as well:

Creating 2 layers of 512 units.
Traceback (most recent call last):
  File "/usr/local/bin/g2p-seq2seq", line 11, in <module>
    load_entry_point('g2p-seq2seq==5.0.0a0', 'console_scripts', 'g2p-seq2seq')()
  File "build/bdist.linux-x86_64/egg/g2p_seq2seq/app.py", line 92, in main
  File "build/bdist.linux-x86_64/egg/g2p_seq2seq/g2p.py", line 101, in load_decode_model
  File "build/bdist.linux-x86_64/egg/g2p_seq2seq/seq2seq_model.py", line 164, in __init__
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/legacy_seq2seq/python/ops/seq2seq.py", line 1201, in model_with_buckets
    decoder_inputs[:bucket[1]])
  File "build/bdist.linux-x86_64/egg/g2p_seq2seq/seq2seq_model.py", line 163, in <lambda>
  File "build/bdist.linux-x86_64/egg/g2p_seq2seq/seq2seq_model.py", line 140, in seq2seq_f
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/legacy_seq2seq/python/ops/seq2seq.py", line 855, in embedding_attention_seq2seq
    encoder_cell, encoder_inputs, dtype=dtype)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/rnn/python/ops/core_rnn.py", line 197, in static_rnn
    (output, state) = call_cell()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/rnn/python/ops/core_rnn.py", line 184, in <lambda>
    call_cell = lambda: cell(input_, state)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/rnn/python/ops/core_rnn_cell_impl.py", line 881, in __call__
    return self._cell(embedded, state)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/rnn/python/ops/core_rnn_cell_impl.py", line 953, in __call__
    cur_inp, new_state = cell(cur_inp, cur_state)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/rnn/python/ops/core_rnn_cell_impl.py", line 146, in __call__
    with _checked_scope(self, scope or "gru_cell", reuse=self._reuse):
  File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__
    return self.gen.next()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/rnn/python/ops/core_rnn_cell_impl.py", line 77, in _checked_scope
    type(cell).__name__))
ValueError: Attempt to reuse RNNCell <tensorflow.contrib.rnn.python.ops.core_rnn_cell_impl.GRUCell object at 0x7f70be2d3750> with a different variable scope than its first use.  First use of cell was with scope 'embedding_attention_seq2seq/rnn/multi_rnn_cell/cell_0/gru_cell', this attempt is with scope 'embedding_attention_seq2seq/rnn/multi_rnn_cell/cell_1/gru_cell'.  Please create a new instance of the cell if you would like it to use a different set of weights.  If before you were using: MultiRNNCell([GRUCell(...)] * num_layers), change to: MultiRNNCell([GRUCell(...) for _ in range(num_layers)]).  If before you were using the same cell instance as both the forward and reverse cell of a bidirectional RNN, simply create two instances (one for forward, one for reverse).  In May 2017, we will start transitioning this cell's behavior to use existing stored weights, if any, when it is called with scope=None (which can lead to silent model degradation, so this error will remain until then.)

lifefeel commented 6 years ago

@loretoparisi the TF version is changed, you should retrain the model. Try from training and then run interactive mode.

loretoparisi commented 6 years ago

@lifefeel thanks! I'm now training with TF 1.5.0. Do you think that latest TF 1.6 gpu would work as well?

loretoparisi commented 6 years ago

@roeeaharoni you just have to download the CMU dictionary and run on it, see here for the link: https://github.com/cmusphinx/g2p-seq2seq#training-g2p-system

roeeaharoni commented 6 years ago

Works, thanks!

loretoparisi commented 6 years ago

Closing this since it was basically solved!

lifefeel commented 6 years ago

@loretoparisi I'm using TF 1.6.0 and works well.

cmusphinx / g2p-seq2seq

kernel not found in checkpoint #102