Closed mesut92 closed 1 year ago
can't reproduce. In general it is OOM issue
I am getting the same error on a rtx 4090 on the ljspeech dataset using the !CUDA_VISIBLE_DEVICES=0 python3 recipes/ljspeech/vits_tts/train_vits.py
edit: a solution is mentioned here
Describe the bug
I am trying to train vits with ljspeech on 4090. i am getting that error, i could not fix. I update the torch and nvidia drivers.
To Reproduce
run this code: python recipes/turk/vits_tts/train_vits.py
getting this error /usr/local/lib/python3.8/dist-packages/torch/functional.py:632: UserWarning: stft will soon require the return_complex parameter be given for real inputs, and will further require that return_complex=True in a future PyTorch release. (Triggered internally at ../aten/src/ATen/native/SpectralOps.cpp:801.) return _VF.stft(input, n_fft, hop_length, win_length, window, # type: ignore[attr-defined] ! Run is removed from /media/mesut/Depo1/works/TTS/recipes/turk/vits_tts/vits_ljspeech-February-26-2023_08+55AM-0000000 Traceback (most recent call last): File "/usr/local/lib/python3.8/dist-packages/trainer/trainer.py", line 1591, in fit self._fit() File "/usr/local/lib/python3.8/dist-packages/trainer/trainer.py", line 1544, in _fit self.train_epoch() File "/usr/local/lib/python3.8/dist-packages/trainer/trainer.py", line 1309, in trainepoch , _ = self.train_step(batch, batch_num_steps, cur_step, loader_start_time) File "/usr/local/lib/python3.8/dist-packages/trainer/trainer.py", line 1126, in train_step batch = self.format_batch(batch) File "/usr/local/lib/python3.8/dist-packages/trainer/trainer.py", line 926, in format_batch batch = self.model.format_batch_on_device(batch) File "/media/mesut/Depo1/works/TTS/TTS/tts/models/vits.py", line 1503, in format_batch_on_device batch["spec"] = wav_to_spec(wav, ac.fft_size, ac.hop_length, ac.win_length, center=False) File "/media/mesut/Depo1/works/TTS/TTS/tts/models/vits.py", line 123, in wav_to_spec spec = torch.stft( File "/usr/local/lib/python3.8/dist-packages/torch/functional.py", line 632, in stft return _VF.stft(input, n_fft, hop_length, win_length, window, # type: ignore[attr-defined] RuntimeError: cuFFT error: CUFFT_INTERNAL_ERROR
Expected behavior
start to train
Logs
No response
Environment
Additional context
No response