Exiting due to exception: OOM when allocating tensor with shape

avivelor commented 6 years ago

Hi all,

I'm currently running tacotron-2 with the following details: Ubuntu 16.04, 64-bit Tensorflow-gpu built from source (1.8) Python: 3.6 (anaconda) CUDA/CUDNN: 9.0/7.1 GPU: Nvidia GTX 1080

I'm able to train tacotron fine, but when I hit wavenet I get the following error: "Exiting due to exception: OOM when allocating tensor with shape[9216,4,512] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[Node: model/inference/residual_block_conv_layer_ResidualConv1dGLU_29/transpose = Transpose[T=DT_FLOAT, Tperm=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](model/inference/residual_block_conv_layer_ResidualConv1dGLU_29/Pad, model/optimizer/gradients/model/inference/residual_block_conv_layer_ResidualConv1dGLU_29/transpose_3_grad/InvertPermutation)]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info."

Does anyone know why I'm getting this error and or how to fix it? Thank you for your time!

Full details can be found here: `#############################################################

Wavenet Train

###########################################################

Checkpoint_path: logs-Tacotron-2/wave_pretrained/wavenet_model.ckpt Loading training data from: tacotron_output/gta/map.txt Using model: Tacotron-2 Hyperparameters: allow_clipping_in_normalization: True attention_dim: 128 attention_filters: 32 attention_kernel: (31,) cin_channels: 80 cleaners: english_cleaners clip_mels_length: True cross_entropy_pos_weight: 1 cumulative_weights: True decoder_layers: 2 decoder_lstm_units: 1024 embedding_dim: 512 enc_conv_channels: 512 enc_conv_kernel_size: (5,) enc_conv_num_layers: 3 encoder_lstm_units: 256 fmax: 7600 fmin: 0 frame_shift_ms: None freq_axis_kernel_size: 3 gate_channels: 512 gin_channels: -1 griffin_lim_iters: 60 hop_size: 300 input_type: raw kernel_size: 3 layers: 30 leaky_alpha: 0.4 log_scale_min: -32.23619130191664 log_scale_min_gauss: -16.11809565095832 mask_decoder: False mask_encoder: False max_abs_value: 4.0 max_iters: 1000 max_mel_frames: 1300 max_time_sec: None max_time_steps: 8000 min_level_db: -100 n_fft: 2048 n_speakers: 5 natural_eval: False normalize_for_wavenet: True num_freq: 1025 num_mels: 80 out_channels: 2 outputs_per_step: 2 postnet_channels: 512 postnet_kernel_size: (5,) postnet_num_layers: 5 power: 1.5 predict_linear: True prenet_layers: [256, 256] quantize_channels: 65536 ref_level_db: 20 rescale: True rescaling_max: 0.999 residual_channels: 512 sample_rate: 24000 signal_normalization: True silence_threshold: 2 skip_out_channels: 256 smoothing: False stacks: 3 stop_at_any: True symmetric_mels: False tacotron_adam_beta1: 0.9 tacotron_adam_beta2: 0.999 tacotron_adam_epsilon: 1e-06 tacotron_batch_size: 32 tacotron_clip_gradients: False tacotron_data_random_state: 1234 tacotron_decay_learning_rate: True tacotron_decay_rate: 0.4 tacotron_decay_steps: 50000 tacotron_dropout_rate: 0.5 tacotron_final_learning_rate: 1e-05 tacotron_initial_learning_rate: 0.001 tacotron_random_seed: 5339 tacotron_reg_weight: 1e-06 tacotron_scale_regularization: True tacotron_start_decay: 50000 tacotron_swap_with_cpu: False tacotron_synthesis_batch_size: 512 tacotron_teacher_forcing_decay_alpha: 0.0 tacotron_teacher_forcing_decay_steps: 280000 tacotron_teacher_forcing_final_ratio: 0.0 tacotron_teacher_forcing_init_ratio: 1.0 tacotron_teacher_forcing_mode: constant tacotron_teacher_forcing_ratio: 1.0 tacotron_teacher_forcing_start_decay: 10000 tacotron_test_batches: 48 tacotron_test_size: None tacotron_zoneout_rate: 0.1 train_with_GTA: True trim_fft_size: 512 trim_hop_size: 128 trim_silence: True trim_top_db: 23 upsample_conditional_features: True upsample_scales: [15, 20] use_bias: True use_lws: False use_speaker_embedding: True wavenet_adam_beta1: 0.9 wavenet_adam_beta2: 0.999 wavenet_adam_epsilon: 1e-08 wavenet_batch_size: 4 wavenet_data_random_state: 1234 wavenet_dropout: 0.05 wavenet_ema_decay: 0.9999 wavenet_learning_rate: 0.001 wavenet_random_seed: 5339 wavenet_swap_with_cpu: False wavenet_synthesis_batch_size: 4 wavenet_test_batches: None wavenet_test_size: 0.0441 win_size: 1200 Initializing Wavenet model. Dimensions (? = dynamic shape): Train mode: True Eval mode: False Synthesis mode: False inputs: (?, 1, ?) local_condition: (?, 80, ?) targets: (?, ?) outputs: (?, ?) Initializing Wavenet model. Dimensions (? = dynamic shape): Train mode: False Eval mode: True Synthesis mode: False local_condition: (1, 80, ?) targets: (?,) outputs: (?,) Wavenet training set to a maximum of 1300000 steps

Generated 32 train batches of size 4 in 0.105 sec

Generated 578 test batches of size 1 in 0.376 sec Exiting due to exception: OOM when allocating tensor with shape[9216,4,512] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[Node: model/inference/residual_block_conv_layer_ResidualConv1dGLU_29/transpose = Transpose[T=DT_FLOAT, Tperm=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](model/inference/residual_block_conv_layer_ResidualConv1dGLU_29/Pad, model/optimizer/gradients/model/inference/residual_block_conv_layer_ResidualConv1dGLU_29/transpose_3_grad/InvertPermutation)]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

 [[Node: model/inference/residual_block_skip_conv_layer_ResidualConv1dGLU_29/strided_slice_2/_1509 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_10290...ed_slice_2", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]