NVIDIA / tacotron2

Tacotron 2 - PyTorch implementation with faster-than-realtime inference
BSD 3-Clause "New" or "Revised" License
4.97k stars 1.37k forks source link

RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED #587

Open np-n opened 1 year ago

np-n commented 1 year ago

I am trying to fine-tune tacotron2. Currently, I am using the following cuda and torch dependencies:

cudatoolkit               10.0.130                      0  
cudnn                     7.6.5                cuda10.0_0  
pytorch                   1.2.0           cuda100py37h938c94c_0  
torchvision               0.4.0           cuda100py37hecfc37a_0  

while training tacotron2, I got stuck with the following runtime error RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED.

This is the full error message:

Epoch: 0
Traceback (most recent call last):
  File "train.py", line 307, in <module>
    args.warm_start, args.n_gpus, args.rank, args.group_name, hparams)
  File "train.py", line 228, in train
    y_pred = model(x)
  File "/home/zakipoint/miniconda3/envs/tts_train/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/zakipoint/tts/tacotron2/model.py", line 505, in forward
    encoder_outputs = self.encoder(embedded_inputs, text_lengths)
  File "/home/zakipoint/miniconda3/envs/tts_train/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/zakipoint/tts/tacotron2/model.py", line 185, in forward
    outputs, _ = self.lstm(x)
  File "/home/zakipoint/miniconda3/envs/tts_train/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/zakipoint/miniconda3/envs/tts_train/lib/python3.7/site-packages/torch/nn/modules/rnn.py", line 562, in forward
    return self.forward_packed(input, hx)
  File "/home/zakipoint/miniconda3/envs/tts_train/lib/python3.7/site-packages/torch/nn/modules/rnn.py", line 554, in forward_packed
    output, hidden = self.forward_impl(input, hx, batch_sizes, max_batch_size, sorted_indices)
  File "/home/zakipoint/miniconda3/envs/tts_train/lib/python3.7/site-packages/torch/nn/modules/rnn.py", line 529, in forward_impl
    self.num_layers, self.dropout, self.training, self.bidirectional)
RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED

I am looking for possible solutions to get rid of it. Thank you!