Closed kin0303 closed 2 years ago
I'm trying to run again and error information is
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)`
There's an error in Tacotron models that is getting fixed by https://github.com/coqui-ai/TTS/pull/977/commits/27825d815044f75c2c8b43177a5aa21f47b33bae, I'd suggest trying running the training again after the PR has been merged
There's an error in Tacotron models that is getting fixed by 27825d8, I'd suggest trying running the training again after the PR has been merged
still error
--> STEP: 2027/3243 -- GLOBAL_STEP: 5270
| > decoder_loss: 34.25191 (34.01939)
| > postnet_loss: 36.28823 (36.02014)
| > stopnet_loss: 0.70222 (0.75120)
| > decoder_coarse_loss: 34.14155 (33.93135)
| > decoder_ddc_loss: 0.00166 (0.00278)
| > ga_loss: 0.00311 (0.00502)
| > decoder_diff_spec_loss: 0.44446 (0.43518)
| > postnet_diff_spec_loss: 4.45212 (4.42896)
| > decoder_ssim_loss: 0.99998 (0.99993)
| > postnet_ssim_loss: 0.99982 (0.99966)
| > loss: 28.61271 (28.48565)
| > align_error: 0.98953 (0.98365)
| > grad_norm: 3.22861 (3.79975)
| > current_lr: 0.00000
| > step_time: 0.71590 (0.44813)
| > loader_time: 0.00150 (0.00726)
! Run is kept in /media/DATA-2/TTS/coqui/TTS/run-April-14-2022_09+10AM-0cf3265a
Traceback (most recent call last):
File "/media/DATA-2/TTS/coqui/tts_coqui/lib/python3.8/site-packages/trainer/trainer.py", line 1461, in fit
self._fit()
File "/media/DATA-2/TTS/coqui/tts_coqui/lib/python3.8/site-packages/trainer/trainer.py", line 1445, in _fit
self.train_epoch()
File "/media/DATA-2/TTS/coqui/tts_coqui/lib/python3.8/site-packages/trainer/trainer.py", line 1224, in train_epoch
_, _ = self.train_step(batch, batch_num_steps, cur_step, loader_start_time)
File "/media/DATA-2/TTS/coqui/tts_coqui/lib/python3.8/site-packages/trainer/trainer.py", line 1057, in train_step
outputs, loss_dict_new, step_time = self._optimize(
File "/media/DATA-2/TTS/coqui/tts_coqui/lib/python3.8/site-packages/trainer/trainer.py", line 946, in _optimize
outputs, loss_dict = self._model_train_step(batch, model, criterion)
File "/media/DATA-2/TTS/coqui/tts_coqui/lib/python3.8/site-packages/trainer/trainer.py", line 902, in _model_train_step
return model.train_step(*input_args)
File "/media/DATA-2/TTS/coqui/TTS/TTS/tts/models/tacotron2.py", line 279, in train_step
outputs = self.forward(text_input, text_lengths, mel_input, mel_lengths, aux_input)
File "/media/DATA-2/TTS/coqui/TTS/TTS/tts/models/tacotron2.py", line 172, in forward
decoder_outputs, alignments, stop_tokens = self.decoder(encoder_outputs, mel_specs, input_mask)
File "/media/DATA-2/TTS/coqui/tts_coqui/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/media/DATA-2/TTS/coqui/TTS/TTS/tts/layers/tacotron/tacotron2.py", line 321, in forward
decoder_output, attention_weights, stop_token = self.decode(memory)
File "/media/DATA-2/TTS/coqui/TTS/TTS/tts/layers/tacotron/tacotron2.py", line 277, in decode
self.decoder_hidden, self.decoder_cell = self.decoder_rnn(
File "/media/DATA-2/TTS/coqui/tts_coqui/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/media/DATA-2/TTS/coqui/tts_coqui/lib/python3.8/site-packages/torch/nn/modules/rnn.py", line 1179, in forward
ret = _VF.lstm_cell(
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)`
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discussion channels.
I did training with the tacotron2 ddc model using the ljspeech dataset and the training ran for a while then stopped with the statement: Segmentation fault (core dumped)
output display as follows:
Environment
{ "CUDA": { "GPU": [ "NVIDIA GeForce GTX 1660 Ti" ], "available": true, "version": "10.2" }, "Packages": { "PyTorch_debug": false, "PyTorch_version": "1.11.0+cu102", "TTS": "0.6.1", "numpy": "1.19.5" }, "System": { "OS": "Linux", "architecture": [ "64bit", "ELF" ], "processor": "x86_64", "python": "3.8.0", "version": "#118~18.04.1-Ubuntu SMP Thu Mar 3 13:53:15 UTC 2022" } }