training fails with "Model not found"

j0ma commented 7 years ago

Hi there,

I tried reading the docs and looked at the source code as well but couldn't find an answer, so I thought I'd open an issue:

When I run

g2p-seq2seq --train train.dic \
                          --model my_model_folder \
                          --size 2 \
                          --num_layers 2 \
                          --max_steps 10

I get the following output

Preparing G2P data
Creating vocabularies in my_model_folder
Creating vocabulary my_model_folder/vocab.phoneme
Creating vocabulary my_model_folder/vocab.grapheme
Reading development and training data.
Creating 2 layers of 2 units.
Created model with fresh parameters.
Training done.
Traceback (most recent call last):
  File "/Users/admin/anaconda/bin/g2p-seq2seq", line 11, in <module>
    load_entry_point('g2p-seq2seq==5.0.0a0', 'console_scripts', 'g2p-seq2seq')()
  File "build/bdist.macosx-10.6-x86_64/egg/g2p_seq2seq/app.py", line 67, in main
  File "build/bdist.macosx-10.6-x86_64/egg/g2p_seq2seq/g2p.py", line 234, in train
  File "build/bdist.macosx-10.6-x86_64/egg/g2p_seq2seq/g2p.py", line 337, in evaluate
RuntimeError: Model not found in my_model_folder

It seems like the part that breaks the code is in the beginning of evaluate():

def evaluate(self, test_lines):
    """Calculate and print out word error rate (WER) and Accuracy
       on test sample.

    Args:
      test_lines: List of test dictionary. Each element of list must be String
                containing word and its pronounciation (e.g., "word W ER D");
    """
    if not hasattr(self, "model"):
      raise RuntimeError("Model not found in %s" % self.model_dir)

    # ...

Seems like the G2PModel instance isn't binding self.model to anything? Any idea if there is a simple fix available?

Same error gets repeated if i change into another conda virtual environment and re-install g2p-seq2seq.

Thank you very much for all your help!

gorinars commented 7 years ago

Indeed, training does not seem to work. I guess the problem is in max_steps parameter. You should try at least 1000. Also note that size 2 is way too small (we used 64...512)

j0ma commented 7 years ago

Hey and thanks for your input!

After running

g2p-seq2seq --train train.dic \
                       --model my_model_folder \
                       --size 2 \
                       --num_layers 2 \
                       --max_steps 1500

I get

Preparing G2P data
Creating vocabularies in my_model_folder
Creating vocabulary my_model_folder/vocab.phoneme
Creating vocabulary my_model_folder/vocab.grapheme
Reading development and training data.
Creating 2 layers of 2 units.
WARNING:tensorflow:From /Users/admin/anaconda/envs/mypy3/lib/python3.5/site-packages/tensorflow/models/rnn/translate/seq2seq_model.py:188 in __init__.: all_va
Instructions for updating:
Please use tf.global_variables instead.
Created model with fresh parameters.
global step 200 learning rate 0.5000 step-time 0.08 perplexity 14.98
  eval: bucket 0 perplexity 14.22
  eval: bucket 1 perplexity 4.89
  eval: bucket 2 perplexity 1.24
global step 400 learning rate 0.5000 step-time 0.06 perplexity 11.38
  eval: bucket 0 perplexity 15.97
  eval: bucket 1 perplexity 3.04
  eval: bucket 2 perplexity 1.17
global step 600 learning rate 0.5000 step-time 0.06 perplexity 10.22
  eval: bucket 0 perplexity 9.03
  eval: bucket 1 perplexity 2.40
  eval: bucket 2 perplexity 1.22
global step 800 learning rate 0.5000 step-time 0.07 perplexity 9.46
  eval: bucket 0 perplexity 6.27
  eval: bucket 1 perplexity 2.49
  eval: bucket 2 perplexity 1.09
global step 1000 learning rate 0.5000 step-time 0.06 perplexity 8.55
  eval: bucket 0 perplexity 6.75
  eval: bucket 1 perplexity 3.05
  eval: bucket 2 perplexity 1.03
global step 1200 learning rate 0.5000 step-time 0.06 perplexity 8.14
  eval: bucket 0 perplexity 6.77
  eval: bucket 1 perplexity 2.16
  eval: bucket 2 perplexity 1.02
global step 1400 learning rate 0.5000 step-time 0.07 perplexity 8.08
  eval: bucket 0 perplexity 9.32
  eval: bucket 1 perplexity 3.11
  eval: bucket 2 perplexity 1.02
Training done.
Traceback (most recent call last):
  File "/Users/admin/anaconda/envs/mypy3/bin/g2p-seq2seq", line 11, in <module>
    load_entry_point('g2p-seq2seq==5.0.0a0', 'console_scripts', 'g2p-seq2seq')()
  File "/Users/admin/anaconda/envs/mypy3/lib/python3.5/site-packages/g2p_seq2seq-5.0.0a0-py3.5.egg/g2p_seq2seq/app.py", line 67, in main
  File "/Users/admin/anaconda/envs/mypy3/lib/python3.5/contextlib.py", line 77, in __exit__
    self.gen.throw(type, value, traceback)
  File "/Users/admin/anaconda/envs/mypy3/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 3517, in get_controller
    yield default
  File "/Users/admin/anaconda/envs/mypy3/lib/python3.5/site-packages/g2p_seq2seq-5.0.0a0-py3.5.egg/g2p_seq2seq/app.py", line 67, in main
  File "/Users/admin/anaconda/envs/mypy3/lib/python3.5/site-packages/g2p_seq2seq-5.0.0a0-py3.5.egg/g2p_seq2seq/g2p.py", line 234, in train
  File "/Users/admin/anaconda/envs/mypy3/lib/python3.5/contextlib.py", line 77, in __exit__
    self.gen.throw(type, value, traceback)
  File "/Users/admin/anaconda/envs/mypy3/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 3517, in get_controller
    yield default
  File "/Users/admin/anaconda/envs/mypy3/lib/python3.5/site-packages/g2p_seq2seq-5.0.0a0-py3.5.egg/g2p_seq2seq/g2p.py", line 234, in train
  File "/Users/admin/anaconda/envs/mypy3/lib/python3.5/site-packages/g2p_seq2seq-5.0.0a0-py3.5.egg/g2p_seq2seq/g2p.py", line 337, in evaluate
RuntimeError: Model not found in my_model_folder

I tried with 5000 as well, but still getting the same error. Changing size also didn't do anything. Ran this one inside a python 3 virtual environment, as well as Python 2, and also tried fixing the warnings in Tensorflow, but it didn't help even though I managed to get rid of them.

What is interesting, however, is that running python setup.py test runs fine:

running test
running egg_info
writing entry points to g2p_seq2seq.egg-info/entry_points.txt
writing g2p_seq2seq.egg-info/PKG-INFO
writing top-level names to g2p_seq2seq.egg-info/top_level.txt
writing requirements to g2p_seq2seq.egg-info/requires.txt
writing dependency_links to g2p_seq2seq.egg-info/dependency_links.txt
reading manifest file 'g2p_seq2seq.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
writing manifest file 'g2p_seq2seq.egg-info/SOURCES.txt'
running build_ext
test_decode (tests.g2p_unittest.TestG2P) ... /Users/admin/Documents/school/thesis/g2p-seq2seq/g2p_seq2seq/data_utils.py:132: ResourceWarning: unclosed file <_
  params = open(os.path.join(model_path, "model.params")).readlines()
Creating 1 layers of 2 units.
WARNING:tensorflow:From /Users/admin/anaconda/envs/mypy3/lib/python3.5/site-packages/tensorflow/models/rnn/translate/seq2seq_model.py:188 in __init__.: all_va
Instructions for updating:
Please use tf.global_variables instead.
Reading model parameters from tests/models/decode
/Users/admin/Documents/school/thesis/g2p-seq2seq/tests/g2p_unittest.py:37: ResourceWarning: unclosed file <_io.TextIOWrapper name='tests/data/toydict.grapheme
  decode_lines = open("tests/data/toydict.graphemes").readlines()
cabcabbacab C
abcabac B C B
a B
ok
test_evaluate (tests.g2p_unittest.TestG2P) ... Creating 1 layers of 2 units.
WARNING:tensorflow:From /Users/admin/anaconda/envs/mypy3/lib/python3.5/site-packages/tensorflow/models/rnn/translate/seq2seq_model.py:188 in __init__.: all_va
Instructions for updating:
Please use tf.global_variables instead.
Reading model parameters from tests/models/decode
/Users/admin/Documents/school/thesis/g2p-seq2seq/tests/g2p_unittest.py:26: ResourceWarning: unclosed file <_io.TextIOWrapper name='tests/data/toydict.test' mo
  test_lines = open("tests/data/toydict.test").readlines()
Beginning calculation word error rate (WER) on test sample.
Words: 3
Errors: 2
WER: 0.667
Accuracy: 0.333
ok
test_train (tests.g2p_unittest.TestG2P) ... Preparing G2P data
/Users/admin/Documents/school/thesis/g2p-seq2seq/g2p_seq2seq/data_utils.py:189: ResourceWarning: unclosed file <_io.BufferedReader name='tests/data/toydict.tr
  source_dic = codecs.open(train_path, "r", "utf-8").readlines()
/Users/admin/Documents/school/thesis/g2p-seq2seq/g2p_seq2seq/data_utils.py:192: ResourceWarning: unclosed file <_io.BufferedReader name='tests/data/toydict.te
  valid_dic = codecs.open(valid_path, "r", "utf-8").readlines()
/Users/admin/Documents/school/thesis/g2p-seq2seq/g2p_seq2seq/data_utils.py:194: ResourceWarning: unclosed file <_io.BufferedReader name='tests/data/toydict.te
  test_dic = codecs.open(test_path, "r", "utf-8").readlines()
Creating vocabularies in None
Reading development and training data.
Creating 1 layers of 2 units.
WARNING:tensorflow:From /Users/admin/anaconda/envs/mypy3/lib/python3.5/site-packages/tensorflow/models/rnn/translate/seq2seq_model.py:188 in __init__.: all_va
Instructions for updating:
Please use tf.global_variables instead.
Created model with fresh parameters.
global step 1 learning rate 0.5000 step-time 0.65 perplexity 6.13
  eval: bucket 0 perplexity 5.64
  eval: bucket 1 perplexity 4.15
  eval: bucket 2 perplexity 3.95
global step 2 learning rate 0.5000 step-time 1.58 perplexity 4.04
  eval: bucket 0 perplexity 4.94
  eval: bucket 1 perplexity 3.35
  eval: bucket 2 perplexity 3.25
Training done.
ok

----------------------------------------------------------------------
Ran 3 tests in 98.293s

OK

Thanks again for your help; hopefully there is a fix to this!

gorinars commented 7 years ago

Could you check what is inside my_model_folder if it was created? One more thing to try is to specify the absolute path: I wonder if it can be a bug related to anaconda virtual env

j0ma commented 7 years ago

Sure thing. Here's the contents of my_model_folder:

my_model_folder/
├── checkpoint
├── model.data-00000-of-00001
├── model.index
├── model.params
├── vocab.grapheme
└── vocab.phoneme

I uninstalled anaconda and tried with the default Python 2.7 that comes with OS X. I also changed the paths to absolute paths. Unfortunately, when I run

g2p-seq2seq --train ~/Documents/school/thesis/testing_g2p/train.dic \ 
                       --model ~/Documents/school/thesis/testing_g2p/my_model_folder \
                       --size 8 --num_layers 2 --max_steps 1200

I get a similar error message:

Preparing G2P data
Creating vocabularies in /Users/admin/Documents/school/thesis/testing_g2p/my_model_folder
Creating vocabulary /Users/admin/Documents/school/thesis/testing_g2p/my_model_folder/vocab.phoneme
Creating vocabulary /Users/admin/Documents/school/thesis/testing_g2p/my_model_folder/vocab.grapheme
Reading development and training data.
Creating 2 layers of 8 units.
WARNING:tensorflow:From /usr/local/lib/python2.7/site-packages/tensorflow/models/rnn/translate/seq2seq_model.py:188 in __init__.: all_variables (from tensorflow.python.ops.variables) is deprecated and will be removed after 2017-03-02.
Instructions for updating:
Please use tf.global_variables instead.
Created model with fresh parameters.
global step 200 learning rate 0.5000 step-time 0.08 perplexity 11.44
  eval: bucket 0 perplexity 6.01
  eval: bucket 1 perplexity 10.58
  eval: bucket 2 perplexity 1.20
global step 400 learning rate 0.5000 step-time 0.07 perplexity 7.19
  eval: bucket 0 perplexity 8.91
  eval: bucket 1 perplexity 10.78
  eval: bucket 2 perplexity 1.12
global step 600 learning rate 0.5000 step-time 0.07 perplexity 4.78
  eval: bucket 0 perplexity 7.41
  eval: bucket 1 perplexity 6.39
  eval: bucket 2 perplexity 1.06
global step 800 learning rate 0.5000 step-time 0.07 perplexity 3.38
  eval: bucket 0 perplexity 10.42
  eval: bucket 1 perplexity 23.26
  eval: bucket 2 perplexity 1.01
global step 1000 learning rate 0.5000 step-time 0.06 perplexity 2.36
  eval: bucket 0 perplexity 6.32
  eval: bucket 1 perplexity 22.13
  eval: bucket 2 perplexity 1.09
global step 1200 learning rate 0.5000 step-time 0.06 perplexity 1.99
  eval: bucket 0 perplexity 36.97
  eval: bucket 1 perplexity 21.72
  eval: bucket 2 perplexity 1.00
Training done.
Traceback (most recent call last):
  File "/usr/local/bin/g2p-seq2seq", line 9, in <module>
    load_entry_point('g2p-seq2seq==5.0.0a0', 'console_scripts', 'g2p-seq2seq')()
  File "build/bdist.macosx-10.11-x86_64/egg/g2p_seq2seq/app.py", line 67, in main
  File "build/bdist.macosx-10.11-x86_64/egg/g2p_seq2seq/g2p.py", line 234, in train
  File "build/bdist.macosx-10.11-x86_64/egg/g2p_seq2seq/g2p.py", line 337, in evaluate
RuntimeError: Model not found in /Users/admin/Documents/school/thesis/testing_g2p/my_model_folder

Could this be somehow related to using OS X instead of Linux? Both are Unix-based so I'm not sure how that would be logical, but at this point I'm pretty confused as to what is the problem.

gorinars commented 7 years ago

Hi again. We finally figured with @nurtas-m that there seems to be a bug with tensorflow version > 11. They changed the way checkpoints are stored...Could you please down-grade to tensorflow 10 or 11 and check again. If works, I will create another issue.

cmusphinx / g2p-seq2seq

training fails with "Model not found" #57