Rayhane-mamah / Tacotron-2

DeepMind's Tacotron-2 Tensorflow implementation
MIT License
2.26k stars 905 forks source link

Erro in train Tacotron without Wavenet #432

Open manuel3265 opened 4 years ago

manuel3265 commented 4 years ago

I'm getting this error, when I train tacotron

I already placed tacotron batch_size in 4, but still

WARNING: The TensorFlow contrib module will not be included in TensorFlow 2.0. For more information, please see:

WARNING:tensorflow:From /home/manuel_garcia02/Tacotron-2/tacotron/models/modules.py:100: LSTMCell.init (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version. Instructions for updating: This class is equivalent as tf.keras.layers.LSTMCell, and will be replaced by that in Tensorflow 2.0. WARNING:tensorflow:From /home/manuel_garcia02/Tacotron-2/tacotron/models/modules.py:387: conv1d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.conv1d instead. WARNING:tensorflow:From /home/manuel_garcia02/Tacotron-2/tacotron/models/modules.py:388: batch_normalization (from tensorflow.python.layers.normalization) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.batch_normalization instead. WARNING:tensorflow:From /home/manuel_garcia02/Tacotron-2/tacotron/models/modules.py:391: dropout (from tensorflow.python.layers.core) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.dropout instead. WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/keras/layers/core.py:143: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version. Instructions for updating: Please use rate instead of keep_prob. Rate should be set to rate = 1 - keep_prob. WARNING:tensorflow:From /home/manuel_garcia02/Tacotron-2/tacotron/models/modules.py:215: bidirectional_dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version. Instructions for updating: Please use keras.layers.Bidirectional(keras.layers.RNN(cell)), which is equivalent to this API WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/rnn.py:443: dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version. Instructions for updating: Please use keras.layers.RNN(cell), which is equivalent to this API WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/rnn.py:626: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. WARNING:tensorflow:From /home/manuel_garcia02/Tacotron-2/tacotron/models/modules.py:279: MultiRNNCell.init (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version. Instructions for updating: This class is equivalent as tf.keras.layers.StackedRNNCells, and will be replaced by that in Tensorflow 2.0. WARNING:tensorflow:From /home/manuel_garcia02/Tacotron-2/tacotron/models/modules.py:246: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.dense instead. initialisation done /gpu:0 Initialized Tacotron model. Dimensions (? = dynamic shape): Train mode: True Eval mode: False GTA mode: False Synthesis mode: False Input: (?, ?) device: 0 embedding: (?, ?, 512) enc conv out: (?, ?, 512) encoder out: (?, ?, 512) decoder out: (?, ?, 20) residual out: (?, ?, 512) projected residual out: (?, ?, 20) mel out: (?, ?, 20)

out: (?, ?) Tacotron Parameters 26.891 Million. WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/losses_impl.py:667: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. initialisation done /gpu:0 Initialized Tacotron model. Dimensions (? = dynamic shape): Train mode: False Eval mode: True GTA mode: False Synthesis mode: False Input: (?, ?) device: 0 embedding: (?, ?, 512) enc conv out: (?, ?, 512) encoder out: (?, ?, 512) decoder out: (?, ?, 20) residual out: (?, ?, 512) projected residual out: (?, ?, 20) mel out: (?, ?, 20) out: (?, ?) Tacotron Parameters 26.891 Million. Tacotron training set to a maximum of 200000 steps Loading checkpoint logs-Tacotron/taco_pretrained/tacotron_model.ckpt-0 WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py:1266: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version. Instructions for updating: Use standard file APIs to check for files with this prefix. Generated 64 train batches of size 32 in 2.408 sec Generated 149 test batches of size 32 in 3.775 sec Step 1 [10.147 sec/step, loss=14.91386, avg_loss=14.91386] Saving Model Character Embeddings visualization.. Tacotron Character embeddings have been updated on tensorboard! Exiting due to exception: OOM when allocating tensor with shape[32,1024] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[node Tacotron_model/inference/decoder/while/CustomDecoderStep/decoder_LSTM/decoder_LSTM/multi_rnn_cell/cell_1/dropout_1/random_uniform/RandomUniform (defined at /home/manuel_garcia02/Tacotron-2/tacotron/models/modules.py:134) ]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. [[node Tacotron_model/clip_by_global_norm/mul_38 (defined at /home/manuel_garcia02/Tacotron-2/tacotron/models/tacotron.py:429) ]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. Caused by op 'Tacotron_model/inference/decoder/while/CustomDecoderStep/decoder_LSTM/decoder_LSTM/multi_rnn_cell/cell_1/dropout_1/random_uniform/RandomUniform', defined at: File "train.py", line 138, in main() File "train.py", line 128, in main tacotron_train(args, log_dir, hparams) File "/home/manuel_garcia02/Tacotron-2/tacotron/train.py", line 399, in tacotron_train return train(log_dir, args, hparams) File "/home/manuel_garcia02/Tacotron-2/tacotron/train.py", line 156, in train model, stats = model_train_mode(args, feeder, hparams, global_step) File "/home/manuel_garcia02/Tacotron-2/tacotron/train.py", line 91, in model_train_mode is_training=True, split_infos=feeder.split_infos) File "/home/manuel_garcia02/Tacotron-2/tacotron/models/tacotron.py", line 173, in initialize swap_memory=hp.tacotron_swap_with_cpu) File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/seq2seq/python/ops/decoder.py", line 322, in dynamic_decode swap_memory=swap_memory) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/control_flow_ops.py", line 3556, in while_loop return_same_structure) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/control_flow_ops.py", line 3087, in BuildLoop pred, body, original_loop_vars, loop_vars, shape_invariants) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/control_flow_ops.py", line 3022, in _BuildLoop body_result = body(*packed_vars_for_body) File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/seq2seq/python/ops/decoder.py", line 265, in body decoder_finished) = decoder.step(time, inputs, state) File "/home/manuel_garcia02/Tacotron-2/tacotron/models/custom_decoder.py", line 118, in step (cell_outputs, stop_token), cell_state = self._cell(inputs, state) File "/home/manuel_garcia02/Tacotron-2/tacotron/models/Architecture_wrappers.py", line 177, in __call__ LSTM_output, next_cell_state = self._cell(LSTM_input, state.cell_state) File "/home/manuel_garcia02/Tacotron-2/tacotron/models/modules.py", line 283, in __call__ return self._cell(inputs, states) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/rnn_cell_impl.py", line 234, in __call__ return super(RNNCell, self).__call__(inputs, state) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/layers/base.py", line 530, in __call__ outputs = super(Layer, self).__call__(inputs, *args, **kwargs) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 554, in __call__ outputs = self.call(inputs, *args, **kwargs) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/rnn_cell_impl.py", line 1515, in call cur_inp, new_state = cell(cur_inp, cur_state) File "/home/manuel_garcia02/Tacotron-2/tacotron/models/modules.py", line 134, in __call__ h = (1 - self._zoneout_outputs) * tf.nn.dropout(new_h - prev_h, (1 - self._zoneout_outputs)) + prev_h File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/util/deprecation.py", line 507, in new_func return func(*args, **kwargs) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/nn_ops.py", line 2979, in dropout return dropout_v2(x, rate, noise_shape=noise_shape, seed=seed, name=name) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/nn_ops.py", line 3048, in dropout_v2 noise_shape, seed=seed, dtype=x.dtype) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/random_ops.py", line 247, in random_uniform rnd = gen_random_ops.random_uniform(shape, dtype, seed=seed1, seed2=seed2) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/gen_random_ops.py", line 777, in random_uniform name=name) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper op_def=op_def) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/util/deprecation.py", line 507, in new_func return func(*args, **kwargs) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 3300, in create_op op_def=op_def) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 1801, in __init__ self._traceback = tf_stack.extract_stack() ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[32,1024] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[node Tacotron_model/inference/decoder/while/CustomDecoderStep/decoder_LSTM/decoder_LSTM/multi_rnn_cell/cell_1/dropout_1/random_uniform/RandomUniform (defined at /home/manuel_garcia02/Tacotron-2/tacotron/models/modules.py:134) ]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. [[node Tacotron_model/clip_by_global_norm/mul_38 (defined at /home/manuel_garcia02/Tacotron-2/tacotron/models/tacotron.py:429) ]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. Traceback (most recent call last): File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1334, in _do_call return fn(*args) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1319, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[32,1024] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[{{node Tacotron_model/inference/decoder/while/CustomDecoderStep/decoder_LSTM/decoder_LSTM/multi_rnn_cell/cell_1/dropout_1/random_uniform/RandomUniform}}]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. [[{{node Tacotron_model/clip_by_global_norm/mul_38}}]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/manuel_garcia02/Tacotron-2/tacotron/train.py", line 225, in train step, loss, opt = sess.run([global_step, model.loss, model.optimize]) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 929, in run run_metadata_ptr) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1152, in _run feed_dict_tensor, options, run_metadata) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1328, in _do_run run_metadata) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1348, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[32,1024] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[node Tacotron_model/inference/decoder/while/CustomDecoderStep/decoder_LSTM/decoder_LSTM/multi_rnn_cell/cell_1/dropout_1/random_uniform/RandomUniform (defined at /home/manuel_garcia02/Tacotron-2/tacotron/models/modules.py:134) ]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. [[node Tacotron_model/clip_by_global_norm/mul_38 (defined at /home/manuel_garcia02/Tacotron-2/tacotron/models/tacotron.py:429) ]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. Caused by op 'Tacotron_model/inference/decoder/while/CustomDecoderStep/decoder_LSTM/decoder_LSTM/multi_rnn_cell/cell_1/dropout_1/random_uniform/RandomUniform', defined at: File "train.py", line 138, in main() File "train.py", line 128, in main tacotron_train(args, log_dir, hparams) File "/home/manuel_garcia02/Tacotron-2/tacotron/train.py", line 399, in tacotron_train return train(log_dir, args, hparams) File "/home/manuel_garcia02/Tacotron-2/tacotron/train.py", line 156, in train model, stats = model_train_mode(args, feeder, hparams, global_step) File "/home/manuel_garcia02/Tacotron-2/tacotron/train.py", line 91, in model_train_mode is_training=True, split_infos=feeder.split_infos) File "/home/manuel_garcia02/Tacotron-2/tacotron/models/tacotron.py", line 173, in initialize swap_memory=hp.tacotron_swap_with_cpu) File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/seq2seq/python/ops/decoder.py", line 322, in dynamic_decode swap_memory=swap_memory) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/control_flow_ops.py", line 3556, in while_loop return_same_structure) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/control_flow_ops.py", line 3087, in BuildLoop pred, body, original_loop_vars, loop_vars, shape_invariants) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/control_flow_ops.py", line 3022, in _BuildLoop body_result = body(*packed_vars_for_body) File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/seq2seq/python/ops/decoder.py", line 265, in body decoder_finished) = decoder.step(time, inputs, state) File "/home/manuel_garcia02/Tacotron-2/tacotron/models/custom_decoder.py", line 118, in step (cell_outputs, stop_token), cell_state = self._cell(inputs, state) File "/home/manuel_garcia02/Tacotron-2/tacotron/models/Architecture_wrappers.py", line 177, in __call__ LSTM_output, next_cell_state = self._cell(LSTM_input, state.cell_state) File "/home/manuel_garcia02/Tacotron-2/tacotron/models/modules.py", line 283, in __call__ return self._cell(inputs, states) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/rnn_cell_impl.py", line 234, in __call__ return super(RNNCell, self).__call__(inputs, state) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/layers/base.py", line 530, in __call__ outputs = super(Layer, self).__call__(inputs, *args, **kwargs) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 554, in __call__ outputs = self.call(inputs, *args, **kwargs) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/rnn_cell_impl.py", line 1515, in call cur_inp, new_state = cell(cur_inp, cur_state) File "/home/manuel_garcia02/Tacotron-2/tacotron/models/modules.py", line 134, in __call__ h = (1 - self._zoneout_outputs) * tf.nn.dropout(new_h - prev_h, (1 - self._zoneout_outputs)) + prev_h File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/util/deprecation.py", line 507, in new_func return func(*args, **kwargs) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/nn_ops.py", line 2979, in dropout return dropout_v2(x, rate, noise_shape=noise_shape, seed=seed, name=name) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/nn_ops.py", line 3048, in dropout_v2 noise_shape, seed=seed, dtype=x.dtype) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/random_ops.py", line 247, in random_uniform rnd = gen_random_ops.random_uniform(shape, dtype, seed=seed1, seed2=seed2) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/gen_random_ops.py", line 777, in random_uniform name=name) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper op_def=op_def) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/util/deprecation.py", line 507, in new_func return func(*args, **kwargs) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 3300, in create_op op_def=op_def) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 1801, in __init__ self._traceback = tf_stack.extract_stack() ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[32,1024] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[node Tacotron_model/inference/decoder/while/CustomDecoderStep/decoder_LSTM/decoder_LSTM/multi_rnn_cell/cell_1/dropout_1/random_uniform/RandomUniform (defined at /home/manuel_garcia02/Tacotron-2/tacotron/models/modules.py:134) ]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. [[node Tacotron_model/clip_by_global_norm/mul_38 (defined at /home/manuel_garcia02/Tacotron-2/tacotron/models/tacotron.py:429) ]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
kennethnakasone commented 4 years ago

I see the lines

tacotron_batch_size: 32
...
...
Generated 64 train batches of size 32 in 2.408 sec
Generated 149 test batches of size 32 in 3.775 sec

in there. Are you sure the batch_size = 4 is taking effect?

For your reference: I've tried setting batch_size down to 4 before, and I believe it used something like 2.4GB of VRAM, so if you've got a GPU with 3GB+ of VRAM, the application should at least run.