Rayhane-mamah / Tacotron-2

DeepMind's Tacotron-2 Tensorflow implementation
MIT License
2.26k stars 907 forks source link

2 GPUs #324

Closed DonggeunYu closed 5 years ago

DonggeunYu commented 5 years ago

I used TITAN XP x2. I tried:

python train.py --model='Tacotron'

but Errorororororororor. Help me please!

Using TensorFlow backend.
Checkpoint path: logs-Tacotron/taco_pretrained/tacotron_model.ckpt
Loading training data from: training_data/train.txt
Using model: Tacotron
Hyperparameters:
  GL_on_GPU: True
  NN_init: True
  NN_scaler: 0.3
  allow_clipping_in_normalization: True
  attention_dim: 128
  attention_filters: 32
  attention_kernel: (31,)
  attention_win_size: 7
  batch_norm_position: after
  cbhg_conv_channels: 128
  cbhg_highway_units: 128
  cbhg_highwaynet_layers: 4
  cbhg_kernels: 8
  cbhg_pool_size: 2
  cbhg_projection: 256
  cbhg_projection_kernel_size: 3
  cbhg_rnn_units: 128
  cdf_loss: False
  cin_channels: 80
  cleaners: english_cleaners
  clip_for_wavenet: True
  clip_mels_length: True
  clip_outputs: True
  cross_entropy_pos_weight: 1
  cumulative_weights: True
  decoder_layers: 2
  decoder_lstm_units: 1024
  embedding_dim: 512
  enc_conv_channels: 512
  enc_conv_kernel_size: (5,)
  enc_conv_num_layers: 3
  encoder_lstm_units: 256
  fmax: 7600
  fmin: 55
  frame_shift_ms: None
  freq_axis_kernel_size: 3
  gate_channels: 256
  gin_channels: -1
  griffin_lim_iters: 60
  hop_size: 275
  input_type: mulaw-quantize
  kernel_size: 3
  layers: 20
  leaky_alpha: 0.4
  legacy: True
  log_scale_min: -32.23619130191664
  log_scale_min_gauss: -16.11809565095832
  lower_bound_decay: 0.1
  magnitude_power: 2.0
  mask_decoder: False
  mask_encoder: True
  max_abs_value: 4.0
  max_iters: 10000
  max_mel_frames: 900
  max_time_sec: None
  max_time_steps: 11000
  min_level_db: -100
  n_fft: 2048
  n_speakers: 5
  normalize_for_wavenet: True
  num_freq: 1025
  num_mels: 80
  out_channels: 2
  outputs_per_step: 1
  postnet_channels: 512
  postnet_kernel_size: (5,)
  postnet_num_layers: 5
  power: 1.5
  predict_linear: True
  preemphasis: 0.97
  preemphasize: True
  prenet_layers: [256, 256]
  quantize_channels: 65536
  ref_level_db: 20
  rescale: True
  rescaling_max: 0.999
  residual_channels: 128
  residual_legacy: True
  sample_rate: 22050
  signal_normalization: True
  silence_threshold: 2
  skip_out_channels: 128
  smoothing: False
  speakers: ['speaker0', 'speaker1', 'speaker2', 'speaker3', 'speaker4']
  speakers_path: None
  split_on_cpu: True
  stacks: 2
  stop_at_any: True
  symmetric_mels: True
  synthesis_constraint: False
  synthesis_constraint_type: window
  tacotron_adam_beta1: 0.9
  tacotron_adam_beta2: 0.999
  tacotron_adam_epsilon: 1e-06
  tacotron_batch_size: 32
  tacotron_clip_gradients: True
  tacotron_data_random_state: 1234
  tacotron_decay_learning_rate: True
  tacotron_decay_rate: 0.5
  tacotron_decay_steps: 18000
  tacotron_dropout_rate: 0.5
  tacotron_final_learning_rate: 0.0001
  tacotron_fine_tuning: False
  tacotron_initial_learning_rate: 0.001
  tacotron_natural_eval: False
  tacotron_num_gpus: 1
  tacotron_random_seed: 5339
  tacotron_reg_weight: 1e-06
  tacotron_scale_regularization: False
  tacotron_start_decay: 40000
  tacotron_swap_with_cpu: False
  tacotron_synthesis_batch_size: 1
  tacotron_teacher_forcing_decay_alpha: None
  tacotron_teacher_forcing_decay_steps: 40000
  tacotron_teacher_forcing_final_ratio: 0.0
  tacotron_teacher_forcing_init_ratio: 1.0
  tacotron_teacher_forcing_mode: constant
  tacotron_teacher_forcing_ratio: 1.0
  tacotron_teacher_forcing_start_decay: 10000
  tacotron_test_batches: None
  tacotron_test_size: 0.05
  tacotron_zoneout_rate: 0.1
  train_with_GTA: True
  trim_fft_size: 2048
  trim_hop_size: 512
  trim_silence: True
  trim_top_db: 40
  upsample_activation: Relu
  upsample_scales: [11, 25]
  upsample_type: SubPixel
  use_bias: True
  use_lws: False
  use_speaker_embedding: True
  wavenet_adam_beta1: 0.9
  wavenet_adam_beta2: 0.999
  wavenet_adam_epsilon: 1e-06
  wavenet_batch_size: 8
  wavenet_clip_gradients: True
  wavenet_data_random_state: 1234
  wavenet_debug_mels: ['training_data/mels/mel-LJ001-0008.npy']
  wavenet_debug_wavs: ['training_data/audio/audio-LJ001-0008.npy']
  wavenet_decay_rate: 0.5
  wavenet_decay_steps: 200000
  wavenet_dropout: 0.05
  wavenet_ema_decay: 0.9999
  wavenet_gradient_max_norm: 100.0
  wavenet_gradient_max_value: 5.0
  wavenet_init_scale: 1.0
  wavenet_learning_rate: 0.001
  wavenet_lr_schedule: exponential
  wavenet_natural_eval: False
  wavenet_num_gpus: 1
  wavenet_pad_sides: 1
  wavenet_random_seed: 5339
  wavenet_swap_with_cpu: False
  wavenet_synth_debug: False
  wavenet_synthesis_batch_size: 20
  wavenet_test_batches: 1
  wavenet_test_size: None
  wavenet_warmup: 4000.0
  wavenet_weight_normalization: False
  win_size: 1100
Loaded metadata for 12853 examples (8.88 hours)
initialisation done /gpu:0
Initialized Tacotron model. Dimensions (? = dynamic shape): 
  Train mode:               True
  Eval mode:                False
  GTA mode:                 False
  Synthesis mode:           False
  Input:                    (?, ?)
  device:                   0
  embedding:                (?, ?, 512)
  enc conv out:             (?, ?, 512)
  encoder out:              (?, ?, 512)
  decoder out:              (?, ?, 80)
  residual out:             (?, ?, 512)
  projected residual out:   (?, ?, 80)
  mel out:                  (?, ?, 80)
  linear out:               (?, ?, 1025)
  <stop_token> out:         (?, ?)
  Tacotron Parameters       29.016 Million.
initialisation done /gpu:0
Initialized Tacotron model. Dimensions (? = dynamic shape): 
  Train mode:               False
  Eval mode:                True
  GTA mode:                 False
  Synthesis mode:           False
  Input:                    (?, ?)
  device:                   0
  embedding:                (?, ?, 512)
  enc conv out:             (?, ?, 512)
  encoder out:              (?, ?, 512)
  decoder out:              (?, ?, 80)
  residual out:             (?, ?, 512)
  projected residual out:   (?, ?, 80)
  mel out:                  (?, ?, 80)
  linear out:               (?, ?, 1025)
  <stop_token> out:         (?, ?)
  Tacotron Parameters       29.016 Million.
Tacotron training set to a maximum of 100000 steps
Loading checkpoint logs-Tacotron/taco_pretrained/tacotron_model.ckpt-0

Generated 20 test batches of size 32 in 1.124 sec

Generated 64 train batches of size 32 in 2.243 sec
Exiting due to exception: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
     [[node Tacotron_model/inference/encoder_convolutions/conv_layer_1_encoder_convolutions/conv1d/conv1d/Conv2D (defined at /home/modeep/Documents/GitHub/Text2Speech/tacotron/models/modules.py:387)  = Conv2D[T=DT_FLOAT, data_format="NCHW", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](Tacotron_model/optimizer_1/gradients/Tacotron_model/inference/encoder_convolutions/conv_layer_1_encoder_convolutions/conv1d/conv1d/Conv2D_grad/Conv2DBackpropFilter-0-TransposeNHWCToNCHW-LayoutOptimizer, Tacotron_model/inference/encoder_convolutions/conv_layer_1_encoder_convolutions/conv1d/conv1d/ExpandDims_1)]]
     [[{{node Tacotron_model/clip_by_global_norm/mul_95/_969}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_12632_Tacotron_model/clip_by_global_norm/mul_95", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Caused by op 'Tacotron_model/inference/encoder_convolutions/conv_layer_1_encoder_convolutions/conv1d/conv1d/Conv2D', defined at:
  File "train.py", line 138, in <module>
    main()
  File "train.py", line 128, in main
    tacotron_train(args, log_dir, hparams)
  File "/home/modeep/Documents/GitHub/Text2Speech/tacotron/train.py", line 399, in tacotron_train
    return train(log_dir, args, hparams)
  File "/home/modeep/Documents/GitHub/Text2Speech/tacotron/train.py", line 156, in train
    model, stats = model_train_mode(args, feeder, hparams, global_step)
  File "/home/modeep/Documents/GitHub/Text2Speech/tacotron/train.py", line 87, in model_train_mode
    is_training=True, split_infos=feeder.split_infos)
  File "/home/modeep/Documents/GitHub/Text2Speech/tacotron/models/tacotron.py", line 124, in initialize
    encoder_outputs = encoder_cell(embedded_inputs, tower_input_lengths[i])
  File "/home/modeep/Documents/GitHub/Text2Speech/tacotron/models/Architecture_wrappers.py", line 38, in __call__
    conv_output = self._convolutions(inputs)
  File "/home/modeep/Documents/GitHub/Text2Speech/tacotron/models/modules.py", line 173, in __call__
    self.is_training, self.drop_rate, self.bnorm, 'conv_layer_{}_'.format(i + 1)+self.scope)
  File "/home/modeep/Documents/GitHub/Text2Speech/tacotron/models/modules.py", line 387, in conv1d
    padding='same')
  File "/home/modeep/anaconda3/envs/conda/lib/python3.6/site-packages/tensorflow/python/layers/convolutional.py", line 214, in conv1d
    return layer.apply(inputs)
  File "/home/modeep/anaconda3/envs/conda/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 817, in apply
    return self.__call__(inputs, *args, **kwargs)
  File "/home/modeep/anaconda3/envs/conda/lib/python3.6/site-packages/tensorflow/python/layers/base.py", line 374, in __call__
    outputs = super(Layer, self).__call__(inputs, *args, **kwargs)
  File "/home/modeep/anaconda3/envs/conda/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 757, in __call__
    outputs = self.call(inputs, *args, **kwargs)
  File "/home/modeep/anaconda3/envs/conda/lib/python3.6/site-packages/tensorflow/python/keras/layers/convolutional.py", line 384, in call
    return super(Conv1D, self).call(inputs)
  File "/home/modeep/anaconda3/envs/conda/lib/python3.6/site-packages/tensorflow/python/keras/layers/convolutional.py", line 194, in call
    outputs = self._convolution_op(inputs, self.kernel)
  File "/home/modeep/anaconda3/envs/conda/lib/python3.6/site-packages/tensorflow/python/ops/nn_ops.py", line 868, in __call__
    return self.conv_op(inp, filter)
  File "/home/modeep/anaconda3/envs/conda/lib/python3.6/site-packages/tensorflow/python/ops/nn_ops.py", line 520, in __call__
    return self.call(inp, filter)
  File "/home/modeep/anaconda3/envs/conda/lib/python3.6/site-packages/tensorflow/python/ops/nn_ops.py", line 204, in __call__
    name=self.name)
  File "/home/modeep/anaconda3/envs/conda/lib/python3.6/site-packages/tensorflow/python/ops/nn_ops.py", line 193, in _conv1d
    name=name)
  File "/home/modeep/anaconda3/envs/conda/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 553, in new_func
    return func(*args, **kwargs)
  File "/home/modeep/anaconda3/envs/conda/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 553, in new_func
    return func(*args, **kwargs)
  File "/home/modeep/anaconda3/envs/conda/lib/python3.6/site-packages/tensorflow/python/ops/nn_ops.py", line 2471, in conv1d
    data_format=data_format)
  File "/home/modeep/anaconda3/envs/conda/lib/python3.6/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 957, in conv2d
    data_format=data_format, dilations=dilations, name=name)
  File "/home/modeep/anaconda3/envs/conda/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/home/modeep/anaconda3/envs/conda/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
    return func(*args, **kwargs)
  File "/home/modeep/anaconda3/envs/conda/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3274, in create_op
    op_def=op_def)
  File "/home/modeep/anaconda3/envs/conda/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1770, in __init__
    self._traceback = tf_stack.extract_stack()

UnknownError (see above for traceback): Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
     [[node Tacotron_model/inference/encoder_convolutions/conv_layer_1_encoder_convolutions/conv1d/conv1d/Conv2D (defined at /home/modeep/Documents/GitHub/Text2Speech/tacotron/models/modules.py:387)  = Conv2D[T=DT_FLOAT, data_format="NCHW", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](Tacotron_model/optimizer_1/gradients/Tacotron_model/inference/encoder_convolutions/conv_layer_1_encoder_convolutions/conv1d/conv1d/Conv2D_grad/Conv2DBackpropFilter-0-TransposeNHWCToNCHW-LayoutOptimizer, Tacotron_model/inference/encoder_convolutions/conv_layer_1_encoder_convolutions/conv1d/conv1d/ExpandDims_1)]]
     [[{{node Tacotron_model/clip_by_global_norm/mul_95/_969}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_12632_Tacotron_model/clip_by_global_norm/mul_95", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Traceback (most recent call last):
  File "/home/modeep/anaconda3/envs/conda/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1334, in _do_call
    return fn(*args)
  File "/home/modeep/anaconda3/envs/conda/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1319, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/home/modeep/anaconda3/envs/conda/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
     [[{{node Tacotron_model/inference/encoder_convolutions/conv_layer_1_encoder_convolutions/conv1d/conv1d/Conv2D}} = Conv2D[T=DT_FLOAT, data_format="NCHW", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](Tacotron_model/optimizer_1/gradients/Tacotron_model/inference/encoder_convolutions/conv_layer_1_encoder_convolutions/conv1d/conv1d/Conv2D_grad/Conv2DBackpropFilter-0-TransposeNHWCToNCHW-LayoutOptimizer, Tacotron_model/inference/encoder_convolutions/conv_layer_1_encoder_convolutions/conv1d/conv1d/ExpandDims_1)]]
     [[{{node Tacotron_model/clip_by_global_norm/mul_95/_969}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_12632_Tacotron_model/clip_by_global_norm/mul_95", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/modeep/Documents/GitHub/Text2Speech/tacotron/train.py", line 225, in train
    step, loss, opt = sess.run([global_step, model.loss, model.optimize])
  File "/home/modeep/anaconda3/envs/conda/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 929, in run
    run_metadata_ptr)
  File "/home/modeep/anaconda3/envs/conda/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1152, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/modeep/anaconda3/envs/conda/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run
    run_metadata)
  File "/home/modeep/anaconda3/envs/conda/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1348, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
     [[node Tacotron_model/inference/encoder_convolutions/conv_layer_1_encoder_convolutions/conv1d/conv1d/Conv2D (defined at /home/modeep/Documents/GitHub/Text2Speech/tacotron/models/modules.py:387)  = Conv2D[T=DT_FLOAT, data_format="NCHW", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](Tacotron_model/optimizer_1/gradients/Tacotron_model/inference/encoder_convolutions/conv_layer_1_encoder_convolutions/conv1d/conv1d/Conv2D_grad/Conv2DBackpropFilter-0-TransposeNHWCToNCHW-LayoutOptimizer, Tacotron_model/inference/encoder_convolutions/conv_layer_1_encoder_convolutions/conv1d/conv1d/ExpandDims_1)]]
     [[{{node Tacotron_model/clip_by_global_norm/mul_95/_969}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_12632_Tacotron_model/clip_by_global_norm/mul_95", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Caused by op 'Tacotron_model/inference/encoder_convolutions/conv_layer_1_encoder_convolutions/conv1d/conv1d/Conv2D', defined at:
  File "train.py", line 138, in <module>
    main()
  File "train.py", line 128, in main
    tacotron_train(args, log_dir, hparams)
  File "/home/modeep/Documents/GitHub/Text2Speech/tacotron/train.py", line 399, in tacotron_train
    return train(log_dir, args, hparams)
  File "/home/modeep/Documents/GitHub/Text2Speech/tacotron/train.py", line 156, in train
    model, stats = model_train_mode(args, feeder, hparams, global_step)
  File "/home/modeep/Documents/GitHub/Text2Speech/tacotron/train.py", line 87, in model_train_mode
    is_training=True, split_infos=feeder.split_infos)
  File "/home/modeep/Documents/GitHub/Text2Speech/tacotron/models/tacotron.py", line 124, in initialize
    encoder_outputs = encoder_cell(embedded_inputs, tower_input_lengths[i])
  File "/home/modeep/Documents/GitHub/Text2Speech/tacotron/models/Architecture_wrappers.py", line 38, in __call__
    conv_output = self._convolutions(inputs)
  File "/home/modeep/Documents/GitHub/Text2Speech/tacotron/models/modules.py", line 173, in __call__
    self.is_training, self.drop_rate, self.bnorm, 'conv_layer_{}_'.format(i + 1)+self.scope)
  File "/home/modeep/Documents/GitHub/Text2Speech/tacotron/models/modules.py", line 387, in conv1d
    padding='same')
  File "/home/modeep/anaconda3/envs/conda/lib/python3.6/site-packages/tensorflow/python/layers/convolutional.py", line 214, in conv1d
    return layer.apply(inputs)
  File "/home/modeep/anaconda3/envs/conda/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 817, in apply
    return self.__call__(inputs, *args, **kwargs)
  File "/home/modeep/anaconda3/envs/conda/lib/python3.6/site-packages/tensorflow/python/layers/base.py", line 374, in __call__
    outputs = super(Layer, self).__call__(inputs, *args, **kwargs)
  File "/home/modeep/anaconda3/envs/conda/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 757, in __call__
    outputs = self.call(inputs, *args, **kwargs)
  File "/home/modeep/anaconda3/envs/conda/lib/python3.6/site-packages/tensorflow/python/keras/layers/convolutional.py", line 384, in call
    return super(Conv1D, self).call(inputs)
  File "/home/modeep/anaconda3/envs/conda/lib/python3.6/site-packages/tensorflow/python/keras/layers/convolutional.py", line 194, in call
    outputs = self._convolution_op(inputs, self.kernel)
  File "/home/modeep/anaconda3/envs/conda/lib/python3.6/site-packages/tensorflow/python/ops/nn_ops.py", line 868, in __call__
    return self.conv_op(inp, filter)
  File "/home/modeep/anaconda3/envs/conda/lib/python3.6/site-packages/tensorflow/python/ops/nn_ops.py", line 520, in __call__
    return self.call(inp, filter)
  File "/home/modeep/anaconda3/envs/conda/lib/python3.6/site-packages/tensorflow/python/ops/nn_ops.py", line 204, in __call__
    name=self.name)
  File "/home/modeep/anaconda3/envs/conda/lib/python3.6/site-packages/tensorflow/python/ops/nn_ops.py", line 193, in _conv1d
    name=name)
  File "/home/modeep/anaconda3/envs/conda/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 553, in new_func
    return func(*args, **kwargs)
  File "/home/modeep/anaconda3/envs/conda/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 553, in new_func
    return func(*args, **kwargs)
  File "/home/modeep/anaconda3/envs/conda/lib/python3.6/site-packages/tensorflow/python/ops/nn_ops.py", line 2471, in conv1d
    data_format=data_format)
  File "/home/modeep/anaconda3/envs/conda/lib/python3.6/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 957, in conv2d
    data_format=data_format, dilations=dilations, name=name)
  File "/home/modeep/anaconda3/envs/conda/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/home/modeep/anaconda3/envs/conda/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
    return func(*args, **kwargs)
  File "/home/modeep/anaconda3/envs/conda/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3274, in create_op
    op_def=op_def)
  File "/home/modeep/anaconda3/envs/conda/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1770, in __init__
    self._traceback = tf_stack.extract_stack()

UnknownError (see above for traceback): Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
     [[node Tacotron_model/inference/encoder_convolutions/conv_layer_1_encoder_convolutions/conv1d/conv1d/Conv2D (defined at /home/modeep/Documents/GitHub/Text2Speech/tacotron/models/modules.py:387)  = Conv2D[T=DT_FLOAT, data_format="NCHW", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](Tacotron_model/optimizer_1/gradients/Tacotron_model/inference/encoder_convolutions/conv_layer_1_encoder_convolutions/conv1d/conv1d/Conv2D_grad/Conv2DBackpropFilter-0-TransposeNHWCToNCHW-LayoutOptimizer, Tacotron_model/inference/encoder_convolutions/conv_layer_1_encoder_convolutions/conv1d/conv1d/ExpandDims_1)]]
     [[{{node Tacotron_model/clip_by_global_norm/mul_95/_969}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_12632_Tacotron_model/clip_by_global_norm/mul_95", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
DonggeunYu commented 5 years ago

fix it!

kitlomer commented 5 years ago

Hi, sorry to bother you. May I ask how did you fix it? I added CUDA_VISIBLE_DEVICES to be 0,1, now the code occupies both of my GPUs but my training speed remains the same as in training with single GPU.

coloneldbugger commented 5 years ago

I'm seeing the same issue that it is only using 1 of my available gpus. It starts multiple processes for each card but nvidia-smi shows that all the processes are using the same card. I also see no increase in performance from additional cards.