CorentinJ / Real-Time-Voice-Cloning

Clone a voice in 5 seconds to generate arbitrary speech in real-time
Other
52.05k stars 8.71k forks source link

Resource exhausted: OOM when allocating tensor with shape[36,512,1,702] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc #651

Closed tailname closed 3 years ago

tailname commented 3 years ago

hello. Please help me, I do not know how to solve my problem problem. I run and completed without errors python synthesizer_preprocess_audio.py <datasets_root> python synthesizer_preprocess_embeds.py <datasets_root>/SV2TTS/synthesizer but after typing python synthesizer_train.py my_run <datasets_root>/SV2TTS/synthesizer shows me a long error

Arguments:
    name:                   my_run
    synthesizer_root:       C:\Users\matve\Documents\Tacotron\datasets\SV2TTS\synthesizer
    models_dir:             synthesizer/saved_models/
    mode:                   synthesis
    GTA:                    True
    restore:                True
    summary_interval:       2500
    embedding_interval:     10000
    checkpoint_interval:    2000
    eval_interval:          100000
    tacotron_train_steps:   2000000
    tf_log_level:           1
    slack_url:              None
    hparams:                

Checkpoint path: synthesizer/saved_models/logs-my_run\taco_pretrained\tacotron_model.ckpt
Loading training data from: C:\Users\matve\Documents\Tacotron\datasets\SV2TTS\synthesizer\train.txt
Using model: Tacotron
Hyperparameters:
  allow_clipping_in_normalization: True
  attention_dim: 128
  attention_filters: 32
  attention_kernel: (31,)
  cbhg_conv_channels: 128
  cbhg_highway_units: 128
  cbhg_highwaynet_layers: 4
  cbhg_kernels: 8
  cbhg_pool_size: 2
  cbhg_projection: 256
  cbhg_projection_kernel_size: 3
  cbhg_rnn_units: 128
  cleaners: english_cleaners
  clip_for_wavenet: True
  clip_mels_length: True
  cross_entropy_pos_weight: 20
  cumulative_weights: True
  decoder_layers: 2
  decoder_lstm_units: 1024
  embedding_dim: 512
  enc_conv_channels: 512
  enc_conv_kernel_size: (5,)
  enc_conv_num_layers: 3
  encoder_lstm_units: 256
  fmax: 7600
  fmin: 55
  frame_shift_ms: None
  griffin_lim_iters: 60
  hop_size: 200
  mask_decoder: False
  mask_encoder: True
  max_abs_value: 4.0
  max_iters: 2000
  max_mel_frames: 900
  min_level_db: -100
  n_fft: 800
  natural_eval: False
  normalize_for_wavenet: True
  num_mels: 80
  outputs_per_step: 2
  postnet_channels: 512
  postnet_kernel_size: (5,)
  postnet_num_layers: 5
  power: 1.5
  predict_linear: False
  preemphasis: 0.97
  preemphasize: True
  prenet_layers: [256, 256]
  ref_level_db: 20
  rescale: True
  rescaling_max: 0.9
  sample_rate: 16000
  signal_normalization: True
  silence_min_duration_split: 0.4
  silence_threshold: 2
  smoothing: False
  speaker_embedding_size: 256
  split_on_cpu: True
  stop_at_any: True
  symmetric_mels: True
  tacotron_adam_beta1: 0.9
  tacotron_adam_beta2: 0.999
  tacotron_adam_epsilon: 1e-06
  tacotron_batch_size: 36
  tacotron_clip_gradients: True
  tacotron_data_random_state: 1234
  tacotron_decay_learning_rate: True
  tacotron_decay_rate: 0.5
  tacotron_decay_steps: 50000
  tacotron_dropout_rate: 0.5
  tacotron_final_learning_rate: 1e-05
  tacotron_gpu_start_idx: 0
  tacotron_initial_learning_rate: 0.001
  tacotron_num_gpus: 1
  tacotron_random_seed: 5339
  tacotron_reg_weight: 1e-07
  tacotron_scale_regularization: False
  tacotron_start_decay: 50000
  tacotron_swap_with_cpu: False
  tacotron_synthesis_batch_size: 128
  tacotron_teacher_forcing_decay_alpha: 0.0
  tacotron_teacher_forcing_decay_steps: 280000
  tacotron_teacher_forcing_final_ratio: 0.0
  tacotron_teacher_forcing_init_ratio: 1.0
  tacotron_teacher_forcing_mode: constant
  tacotron_teacher_forcing_ratio: 1.0
  tacotron_teacher_forcing_start_decay: 10000
  tacotron_test_batches: None
  tacotron_test_size: 0.05
  tacotron_zoneout_rate: 0.1
  train_with_GTA: False
  trim_fft_size: 512
  trim_hop_size: 128
  trim_top_db: 23
  use_lws: False
  utterance_min_duration: 1.6
  win_size: 800
Loaded metadata for 290550 examples (366.70 hours)
initialisation done /gpu:0
Initialized Tacotron model. Dimensions (? = dynamic shape): 
  Train mode:               True
  Eval mode:                False
  GTA mode:                 False
  Synthesis mode:           False
  Input:                    (?, ?)
  device:                   0
  embedding:                (?, ?, 512)
  enc conv out:             (?, ?, 512)
  encoder out (cond):       (?, ?, 768)
  decoder out:              (?, ?, 80)
  residual out:             (?, ?, 512)
  projected residual out:   (?, ?, 80)
  mel out:                  (?, ?, 80)
  <stop_token> out:         (?, ?)
  Tacotron Parameters       28.439 Million.
initialisation done /gpu:0
Initialized Tacotron model. Dimensions (? = dynamic shape): 
  Train mode:               False
  Eval mode:                True
  GTA mode:                 False
  Synthesis mode:           False
  Input:                    (?, ?)
  device:                   0
  embedding:                (?, ?, 512)
  enc conv out:             (?, ?, 512)
  encoder out (cond):       (?, ?, 768)
  decoder out:              (?, ?, 80)
  residual out:             (?, ?, 512)
  projected residual out:   (?, ?, 80)
  mel out:                  (?, ?, 80)
  <stop_token> out:         (?, ?)
  Tacotron Parameters       28.439 Million.
Tacotron training set to a maximum of 2000000 steps
Loading checkpoint synthesizer/saved_models/logs-my_run\taco_pretrained\tacotron_model.ckpt-0

Generated 64 train batches of size 36 in 3.626 sec
Step       1 [5.798 sec/step, loss=14.85899, avg_loss=14.85899]
Step       1 [5.798 sec/step, loss=14.85899, avg_loss=14.85899]

Saving Model Character Embeddings visualization..
Tacotron Character embeddings have been updated on tensorboard!
Step       2 [3.362 sec/step, loss=11.10468, avg_loss=12.98183]
Step       2 [3.362 sec/step, loss=11.10468, avg_loss=12.98183]

Generated 403 test batches of size 36 in 15.574 sec
Exiting due to exception: 2 root error(s) found.
  (0) Resource exhausted: OOM when allocating tensor with shape[36,512,1,702] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
     [[node Tacotron_model/inference/postnet_convolutions/conv_layer_1_postnet_convolutions/conv1d/conv1d (defined at e:\ProgramData\Miniconda3\lib\site-packages\tensorflow_core\python\framework\ops.py:1748) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

     [[Tacotron_model/clip_by_global_norm/mul_30/_479]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

  (1) Resource exhausted: OOM when allocating tensor with shape[36,512,1,702] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
     [[node Tacotron_model/inference/postnet_convolutions/conv_layer_1_postnet_convolutions/conv1d/conv1d (defined at e:\ProgramData\Miniconda3\lib\site-packages\tensorflow_core\python\framework\ops.py:1748) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

0 successful operations.
0 derived errors ignored.

Original stack trace for 'Tacotron_model/inference/postnet_convolutions/conv_layer_1_postnet_convolutions/conv1d/conv1d':
  File "synthesizer_train.py", line 55, in <module>
    tacotron_train(args, log_dir, hparams)
  File "C:\Users\matve\Documents\Tacotron\Real-Time-Voice-Cloning\synthesizer\train.py", line 392, in tacotron_train
    return train(log_dir, args, hparams)
  File "C:\Users\matve\Documents\Tacotron\Real-Time-Voice-Cloning\synthesizer\train.py", line 148, in train
    model, stats = model_train_mode(args, feeder, hparams, global_step)
  File "C:\Users\matve\Documents\Tacotron\Real-Time-Voice-Cloning\synthesizer\train.py", line 91, in model_train_mode
    is_training=True, split_infos=feeder.split_infos)
  File "C:\Users\matve\Documents\Tacotron\Real-Time-Voice-Cloning\synthesizer\models\tacotron.py", line 230, in initialize
    residual = postnet(decoder_output)
  File "C:\Users\matve\Documents\Tacotron\Real-Time-Voice-Cloning\synthesizer\models\modules.py", line 406, in __call__
    "conv_layer_{}_".format(i + 1) + self.scope)
  File "C:\Users\matve\Documents\Tacotron\Real-Time-Voice-Cloning\synthesizer\models\modules.py", line 420, in conv1d
    padding="same")
  File "e:\ProgramData\Miniconda3\lib\site-packages\tensorflow_core\python\util\deprecation.py", line 324, in new_func
    return func(*args, **kwargs)
  File "e:\ProgramData\Miniconda3\lib\site-packages\tensorflow_core\python\layers\convolutional.py", line 218, in conv1d
    return layer.apply(inputs)
  File "e:\ProgramData\Miniconda3\lib\site-packages\tensorflow_core\python\util\deprecation.py", line 324, in new_func
    return func(*args, **kwargs)
  File "e:\ProgramData\Miniconda3\lib\site-packages\tensorflow_core\python\keras\engine\base_layer.py", line 1700, in apply
    return self.__call__(inputs, *args, **kwargs)
  File "e:\ProgramData\Miniconda3\lib\site-packages\tensorflow_core\python\layers\base.py", line 548, in __call__
    outputs = super(Layer, self).__call__(inputs, *args, **kwargs)
  File "e:\ProgramData\Miniconda3\lib\site-packages\tensorflow_core\python\keras\engine\base_layer.py", line 854, in __call__
    outputs = call_fn(cast_inputs, *args, **kwargs)
  File "e:\ProgramData\Miniconda3\lib\site-packages\tensorflow_core\python\autograph\impl\api.py", line 234, in wrapper
    return converted_call(f, options, args, kwargs)
  File "e:\ProgramData\Miniconda3\lib\site-packages\tensorflow_core\python\autograph\impl\api.py", line 439, in converted_call
    return _call_unconverted(f, args, kwargs, options)
  File "e:\ProgramData\Miniconda3\lib\site-packages\tensorflow_core\python\autograph\impl\api.py", line 330, in _call_unconverted
    return f(*args, **kwargs)
  File "e:\ProgramData\Miniconda3\lib\site-packages\tensorflow_core\python\keras\layers\convolutional.py", line 387, in call
    return super(Conv1D, self).call(inputs)
  File "e:\ProgramData\Miniconda3\lib\site-packages\tensorflow_core\python\keras\layers\convolutional.py", line 197, in call
    outputs = self._convolution_op(inputs, self.kernel)
  File "e:\ProgramData\Miniconda3\lib\site-packages\tensorflow_core\python\ops\nn_ops.py", line 1134, in __call__
    return self.conv_op(inp, filter)
  File "e:\ProgramData\Miniconda3\lib\site-packages\tensorflow_core\python\ops\nn_ops.py", line 639, in __call__
    return self.call(inp, filter)
  File "e:\ProgramData\Miniconda3\lib\site-packages\tensorflow_core\python\ops\nn_ops.py", line 238, in __call__
    name=self.name)
  File "e:\ProgramData\Miniconda3\lib\site-packages\tensorflow_core\python\ops\nn_ops.py", line 227, in _conv1d
    name=name)
  File "e:\ProgramData\Miniconda3\lib\site-packages\tensorflow_core\python\util\deprecation.py", line 574, in new_func
    return func(*args, **kwargs)
  File "e:\ProgramData\Miniconda3\lib\site-packages\tensorflow_core\python\util\deprecation.py", line 574, in new_func
    return func(*args, **kwargs)
  File "e:\ProgramData\Miniconda3\lib\site-packages\tensorflow_core\python\ops\nn_ops.py", line 1681, in conv1d
    name=name)
  File "e:\ProgramData\Miniconda3\lib\site-packages\tensorflow_core\python\ops\gen_nn_ops.py", line 1071, in conv2d
    data_format=data_format, dilations=dilations, name=name)
  File "e:\ProgramData\Miniconda3\lib\site-packages\tensorflow_core\python\framework\op_def_library.py", line 794, in _apply_op_helper
    op_def=op_def)
  File "e:\ProgramData\Miniconda3\lib\site-packages\tensorflow_core\python\util\deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "e:\ProgramData\Miniconda3\lib\site-packages\tensorflow_core\python\framework\ops.py", line 3357, in create_op
    attrs, op_def, compute_device)
  File "e:\ProgramData\Miniconda3\lib\site-packages\tensorflow_core\python\framework\ops.py", line 3426, in _create_op_internal
    op_def=op_def)
  File "e:\ProgramData\Miniconda3\lib\site-packages\tensorflow_core\python\framework\ops.py", line 1748, in __init__
    self._traceback = tf_stack.extract_stack()

2021-02-05 20:02:33.232435: W tensorflow/core/kernels/queue_base.cc:277] _1_datafeeder/eval_queue: Skipping cancelled enqueue attempt with queue not closed
2021-02-05 20:02:33.232577: W tensorflow/core/kernels/queue_base.cc:277] _0_datafeeder/input_queue: Skipping cancelled enqueue attempt with queue not closed

I think it can't use the memory of my GTX 1660 super .Tell the noob what to do

ghost commented 3 years ago

Reduce the batch size: https://github.com/CorentinJ/Real-Time-Voice-Cloning/blob/5425557efe30863267f805851f918124191e0be0/synthesizer/hparams.py#L243

tailname commented 3 years ago

Thank you. I set the maximum values for 6 gigabytes of GPU memory tacotron_batch_size=20

tailname commented 3 years ago

Reduce the batch size:

https://github.com/CorentinJ/Real-Time-Voice-Cloning/blob/5425557efe30863267f805851f918124191e0be0/synthesizer/hparams.py#L243

one question. Do you know how much trained synthesizer and vocoder are ?

ghost commented 3 years ago

Depends on the quality of training data. The number of steps of the pretrained models is a good target. (278k synthesizer, 428k vocoder) There is a learning curve to training. I recommend getting some experience with a proven dataset like LibriSpeech before changing the language.