da03 / Attention-OCR

Visual Attention based OCR
MIT License
1.12k stars 363 forks source link

Error when trying to train model on my own dataset #42

Closed mehulmshah closed 7 years ago

mehulmshah commented 7 years ago

Hi,

I'm trying to run your code to train my own dataset, however I get a ValueError. This happens even with the toy dataset given in the README.

Here is the full output when I try to train:

Attention-OCR mehulshah$ python src/launcher.py --phase=train --data-path=train-path.txt --data-base-dir=/ --log-path=log.txt --no-load-model 2017-06-02 11:23:40.694013: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations. 2017-06-02 11:23:40.694034: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations. 2017-06-02 11:23:40.694039: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations. 2017-06-02 11:23:40.694043: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations. 2017-06-02 11:23:40.694066: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations. 2017-06-02 11:23:40,694 root INFO loading data 2017-06-02 11:23:40,695 root INFO phase: train 2017-06-02 11:23:40,695 root INFO model_dir: train 2017-06-02 11:23:40,695 root INFO load_model: False 2017-06-02 11:23:40,695 root INFO output_dir: results 2017-06-02 11:23:40,696 root INFO steps_per_checkpoint: 500 2017-06-02 11:23:40,696 root INFO batch_size: 64 2017-06-02 11:23:40,696 root INFO num_epoch: 1000 2017-06-02 11:23:40,696 root INFO learning_rate: 1 2017-06-02 11:23:40,696 root INFO reg_val: 0 2017-06-02 11:23:40,696 root INFO max_gradient_norm: 5.000000 2017-06-02 11:23:40,697 root INFO clip_gradients: True 2017-06-02 11:23:40,697 root INFO valid_target_length inf 2017-06-02 11:23:40,697 root INFO target_vocab_size: 39 2017-06-02 11:23:40,697 root INFO target_embedding_size: 10.000000 2017-06-02 11:23:40,697 root INFO attn_num_hidden: 128 2017-06-02 11:23:40,697 root INFO attn_num_layers: 2 2017-06-02 11:23:40,698 root INFO visualize: True 2017-06-02 11:23:40,698 root INFO buckets 2017-06-02 11:23:40,698 root INFO [(16, 11), (27, 17), (35, 19), (64, 22), (80, 32)] input_tensor dim: (?, 1, 32, ?) CNN outdim before squeeze: (?, 1, ?, 512) CNN outdim: (?, ?, 512) Traceback (most recent call last): File "src/launcher.py", line 146, in main(sys.argv[1:], exp_config.ExpConfig) File "src/launcher.py", line 142, in main session = sess) File "/Users/mehulshah/Documents/Ongoing/LPR/Attention-OCR/src/model/model.py", line 151, in init use_gru = use_gru) File "/Users/mehulshah/Documents/Ongoing/LPR/Attention-OCR/src/model/seq2seq_model.py", line 141, in init softmax_loss_function=softmax_loss_function) File "/Users/mehulshah/Documents/Ongoing/LPR/Attention-OCR/src/model/seq2seq.py", line 993, in model_with_buckets decoder_inputs[:int(bucket[1])], int(bucket[0])) File "/Users/mehulshah/Documents/Ongoing/LPR/Attention-OCR/src/model/seq2seq_model.py", line 140, in self.target_weights, buckets, lambda x, y, z: seq2seq_f(x, y, z, False), File "/Users/mehulshah/Documents/Ongoing/LPR/Attention-OCR/src/model/seq2seq_model.py", line 122, in seq2seq_f attn_num_hidden = attn_num_hidden) File "/Users/mehulshah/Documents/Ongoing/LPR/Attention-OCR/src/model/seq2seq.py", line 675, in embedding_attention_decoder initial_state_attention=initial_state_attention, attn_num_hidden=attn_num_hidden) File "/Users/mehulshah/Documents/Ongoing/LPR/Attention-OCR/src/model/seq2seq.py", line 577, in attention_decoder cell_output, state = cell(x, state) File "/usr/local/lib/python2.7/site-packages/tensorflow/contrib/rnn/python/ops/core_rnn_cell_impl.py", line 953, in call cur_inp, new_state = cell(cur_inp, cur_state) File "/usr/local/lib/python2.7/site-packages/tensorflow/contrib/rnn/python/ops/core_rnn_cell_impl.py", line 235, in call with _checked_scope(self, scope or "basic_lstm_cell", reuse=self._reuse): File "/usr/local/Cellar/python/2.7.13/Frameworks/Python.framework/Versions/2.7/lib/python2.7/contextlib.py", line 17, in enter return self.gen.next() File "/usr/local/lib/python2.7/site-packages/tensorflow/contrib/rnn/python/ops/core_rnn_cell_impl.py", line 77, in _checked_scope type(cell).name))

And here is the error:

ValueError: Attempt to reuse RNNCell <tensorflow.contrib.rnn.python.ops.core_rnn_cell_impl.BasicLSTMCell object at 0x11a4d0d10> with a different variable scope than its first use. First use of cell was with scope 'embedding_attention_decoder/attention_decoder/multi_rnn_cell/cell_0/basic_lstm_cell', this attempt is with scope 'embedding_attention_decoder/attention_decoder/multi_rnn_cell/cell_1/basic_lstm_cell'. Please create a new instance of the cell if you would like it to use a different set of weights. If before you were using: MultiRNNCell([BasicLSTMCell(...)] * numlayers), change to: MultiRNNCell([BasicLSTMCell(...) for in range(num_layers)]). If before you were using the same cell instance as both the forward and reverse cell of a bidirectional RNN, simply create two instances (one for forward, one for reverse). In May 2017, we will start transitioning this cell's behavior to use existing stored weights, if any, when it is called with scope=None (which can lead to silent model degradation, so this error will remain until then.)

I tried changing the following in seq2seq_model.py:

If before you were using: MultiRNNCell([BasicLSTMCell(...)] * numlayers), change to: MultiRNNCell([BasicLSTMCell(...) for in range(num_layers)])

But it still does not work. Any suggestions?

mehulmshah commented 7 years ago

Nevermind, I figured it out. Downgrading to tensorflow 1.0.0 solves the issue.