Closed shubhishukla10 closed 5 years ago
This is an out of memory exception. You can try lowering the batch size or change the threshold for pruning longer sentences (https://github.com/Rayhane-mamah/Tacotron-2/blob/master/hparams.py#L70)
Generally with an 11GB GPU you should be able to use batch size 32 with max_mel_frames set to 800-900.
You can lower those values and watch "nvidia-smi" to maximize them.
I have a 6GB GPU(GTX 1660ti) and I reduced max_mel_frames to 100 but still I'm getting the same error.
Hmm, did you re-run preprocess.py? Because too long wavs are discarded during preprocessing.
Yup ,it works after re-running preprocess.py with max_mel_frames=400. Thanks! :)
**When I run : python train.py --model='Tacotron-2'
I get this output with error at the end. Please help me with this. I can't understand where the problem is.**
WARNING: The TensorFlow contrib module will not be included in TensorFlow 2.0. For more information, please see:
Using TensorFlow backend.
#############################################################
Tacotron Train
###########################################################
Checkpoint path: logs-Tacotron-2/taco_pretrained/tacotron_model.ckpt Loading training data from: training_data/train.txt Using model: Tacotron-2 Hyperparameters: GL_on_GPU: True NN_init: True NN_scaler: 0.3 allow_clipping_in_normalization: True attention_dim: 128 attention_filters: 32 attention_kernel: (31,) attention_win_size: 7 batch_norm_position: after cbhg_conv_channels: 128 cbhg_highway_units: 128 cbhg_highwaynet_layers: 4 cbhg_kernels: 8 cbhg_pool_size: 2 cbhg_projection: 256 cbhg_projection_kernel_size: 3 cbhg_rnn_units: 128 cdf_loss: False cin_channels: 80 cleaners: english_cleaners clip_for_wavenet: True clip_mels_length: True clip_outputs: True cross_entropy_pos_weight: 1 cumulative_weights: True decoder_layers: 2 decoder_lstm_units: 1024 embedding_dim: 512 enc_conv_channels: 512 enc_conv_kernel_size: (5,) enc_conv_num_layers: 3 encoder_lstm_units: 256 fmax: 7600 fmin: 55 frame_shift_ms: None freq_axis_kernel_size: 3 gate_channels: 256 gin_channels: -1 griffin_lim_iters: 60 hop_size: 275 input_type: raw kernel_size: 3 layers: 20 leaky_alpha: 0.4 legacy: True log_scale_min: -32.23619130191664 log_scale_min_gauss: -16.11809565095832 lower_bound_decay: 0.1 magnitude_power: 2.0 mask_decoder: False mask_encoder: True max_abs_value: 4.0 max_iters: 10000 max_mel_frames: 800 max_time_sec: None max_time_steps: 11000 min_level_db: -100 n_fft: 2048 n_speakers: 5 normalize_for_wavenet: True num_freq: 1025 num_mels: 80 out_channels: 2 outputs_per_step: 1 postnet_channels: 512 postnet_kernel_size: (5,) postnet_num_layers: 5 power: 1.5 predict_linear: True preemphasis: 0.97 preemphasize: True prenet_layers: [256, 256] quantize_channels: 65536 ref_level_db: 20 rescale: True rescaling_max: 0.999 residual_channels: 128 residual_legacy: True sample_rate: 22050 signal_normalization: True silence_threshold: 2 skip_out_channels: 128 smoothing: False speakers: ['speaker0', 'speaker1', 'speaker2', 'speaker3', 'speaker4'] speakers_path: None split_on_cpu: True stacks: 2 stop_at_any: True symmetric_mels: True synthesis_constraint: False synthesis_constraint_type: window tacotron_adam_beta1: 0.9 tacotron_adam_beta2: 0.999 tacotron_adam_epsilon: 1e-06 tacotron_batch_size: 32 tacotron_clip_gradients: True tacotron_data_random_state: 1234 tacotron_decay_learning_rate: True tacotron_decay_rate: 0.5 tacotron_decay_steps: 18000 tacotron_dropout_rate: 0.5 tacotron_final_learning_rate: 0.0001 tacotron_fine_tuning: False tacotron_initial_learning_rate: 0.001 tacotron_natural_eval: False tacotron_num_gpus: 1 tacotron_random_seed: 5339 tacotron_reg_weight: 1e-06 tacotron_scale_regularization: False tacotron_start_decay: 40000 tacotron_swap_with_cpu: False tacotron_synthesis_batch_size: 1 tacotron_teacher_forcing_decay_alpha: None tacotron_teacher_forcing_decay_steps: 40000 tacotron_teacher_forcing_final_ratio: 0.0 tacotron_teacher_forcing_init_ratio: 1.0 tacotron_teacher_forcing_mode: constant tacotron_teacher_forcing_ratio: 1.0 tacotron_teacher_forcing_start_decay: 10000 tacotron_test_batches: None tacotron_test_size: 0.05 tacotron_zoneout_rate: 0.1 train_with_GTA: True trim_fft_size: 2048 trim_hop_size: 512 trim_silence: True trim_top_db: 40 upsample_activation: Relu upsample_scales: [11, 25] upsample_type: SubPixel use_bias: True use_lws: False use_speaker_embedding: True wavenet_adam_beta1: 0.9 wavenet_adam_beta2: 0.999 wavenet_adam_epsilon: 1e-06 wavenet_batch_size: 8 wavenet_clip_gradients: True wavenet_data_random_state: 1234 wavenet_debug_mels: ['training_data/mels/mel-LJ001-0008.npy'] wavenet_debug_wavs: ['training_data/audio/audio-LJ001-0008.npy'] wavenet_decay_rate: 0.5 wavenet_decay_steps: 200000 wavenet_dropout: 0.05 wavenet_ema_decay: 0.9999 wavenet_gradient_max_norm: 100.0 wavenet_gradient_max_value: 5.0 wavenet_init_scale: 1.0 wavenet_learning_rate: 0.001 wavenet_lr_schedule: exponential wavenet_natural_eval: False wavenet_num_gpus: 1 wavenet_pad_sides: 1 wavenet_random_seed: 5339 wavenet_swap_with_cpu: False wavenet_synth_debug: False wavenet_synthesis_batch_size: 20 wavenet_test_batches: 1 wavenet_test_size: None wavenet_warmup: 4000.0 wavenet_weight_normalization: False win_size: 1100 Loaded metadata for 13100 examples (23.76 hours) WARNING:tensorflow:From /home/shubhi/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. WARNING:tensorflow:From /home/shubhi/SanskritTextToSpeech/Tacotron-2/tacotron/models/tacotron.py:64: py_func (from tensorflow.python.ops.script_ops) is deprecated and will be removed in a future version. Instructions for updating: tf.py_func is deprecated in TF V2. Instead, use tf.py_function, which takes a python function which manipulates tf eager tensors instead of numpy arrays. It's easy to convert a tf eager tensor to an ndarray (just call tensor.numpy()) but having access to eager tensors means
tf.py_function
s can use accelerators such as GPUs as well as being differentiable using a gradient tape.WARNING:tensorflow:From /home/shubhi/SanskritTextToSpeech/Tacotron-2/tacotron/models/modules.py:100: LSTMCell.init (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version. Instructions for updating: This class is equivalent as tf.keras.layers.LSTMCell, and will be replaced by that in Tensorflow 2.0. WARNING:tensorflow:From /home/shubhi/SanskritTextToSpeech/Tacotron-2/tacotron/models/modules.py:387: conv1d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.conv1d instead. WARNING:tensorflow:From /home/shubhi/SanskritTextToSpeech/Tacotron-2/tacotron/models/modules.py:388: batch_normalization (from tensorflow.python.layers.normalization) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.batch_normalization instead. WARNING:tensorflow:From /home/shubhi/SanskritTextToSpeech/Tacotron-2/tacotron/models/modules.py:391: dropout (from tensorflow.python.layers.core) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.dropout instead. WARNING:tensorflow:From /home/shubhi/anaconda3/lib/python3.6/site-packages/tensorflow/python/keras/layers/core.py:143: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version. Instructions for updating: Please use
rate
instead ofkeep_prob
. Rate should be set torate = 1 - keep_prob
. WARNING:tensorflow:From /home/shubhi/SanskritTextToSpeech/Tacotron-2/tacotron/models/modules.py:215: bidirectional_dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version. Instructions for updating: Please usekeras.layers.Bidirectional(keras.layers.RNN(cell))
, which is equivalent to this API WARNING:tensorflow:From /home/shubhi/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/rnn.py:443: dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version. Instructions for updating: Please usekeras.layers.RNN(cell)
, which is equivalent to this API WARNING:tensorflow:From /home/shubhi/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/rnn.py:626: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. WARNING:tensorflow:From /home/shubhi/SanskritTextToSpeech/Tacotron-2/tacotron/models/modules.py:279: MultiRNNCell.init (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version. Instructions for updating: This class is equivalent as tf.keras.layers.StackedRNNCells, and will be replaced by that in Tensorflow 2.0. WARNING:tensorflow:From /home/shubhi/SanskritTextToSpeech/Tacotron-2/tacotron/models/modules.py:246: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.dense instead. WARNING:tensorflow:From /home/shubhi/SanskritTextToSpeech/Tacotron-2/tacotron/models/modules.py:34: GRUCell.init (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version. Instructions for updating: This class is equivalent as tf.keras.layers.GRUCell, and will be replaced by that in Tensorflow 2.0. WARNING:tensorflow:From /home/shubhi/SanskritTextToSpeech/Tacotron-2/tacotron/models/modules.py:53: max_pooling1d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.max_pooling1d instead. initialisation done /gpu:0 Initialized Tacotron model. Dimensions (? = dynamic shape): Train mode: True Eval mode: False GTA mode: False Synthesis mode: False Input: (?, ?) device: 0 embedding: (?, ?, 512) enc conv out: (?, ?, 512) encoder out: (?, ?, 512) decoder out: (?, ?, 80) residual out: (?, ?, 512) projected residual out: (?, ?, 80) mel out: (?, ?, 80) linear out: (?, ?, 1025)