fatchord / WaveRNN

WaveRNN Vocoder + TTS
https://fatchord.github.io/model_outputs/
MIT License
2.14k stars 698 forks source link

Can't train WaveRNN with voc_mode = 'RAW' #153

Open tuong-olli opened 5 years ago

tuong-olli commented 5 years ago

I had successfully trained this model with voc_mode = 'MOL' but its synthesized speed is not good. After that, I changed voc_mode from MOL to RAW but it has a bug: /pytorch/aten/src/THCUNN/SpatialClassNLLCriterion.cu:103: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T , T , T , long , T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [11,0,0], thread: [671,0,0] Assertion t >= 0 && t < n_classes failed. THCudaCheck FAIL file=/pytorch/aten/src/THC/THCCachingHostAllocator.cpp line=296 error=59 : device-side assert triggered Traceback (most recent call last): File "train_wavernn.py", line 162, in main() File "train_wavernn.py", line 86, in main voc_train_loop(paths, voc_model, loss_func, optimizer, train_set, test_set, lr, total_steps, train_gta, batch_size) File "train_wavernn.py", line 129, in voc_train_loop loss.backward() File "/data/WaveRNN-master/env-py3.7/lib/python3.7/site-packages/torch/tensor.py", line 118, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph) File "/data/WaveRNN-master/env-py3.7/lib/python3.7/site-packages/torch/autograd/init.py", line 93, in backward allow_unreachable=True) # allow_unreachable flag RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED Please, helps me to solve this problem!

yoks commented 5 years ago

Most likely cause of this error is data format mismatch. You need to reprocess raw data with preprocess.py again for RAW.

OswaldoBornemann commented 4 years ago

@tuong-olli May i ask the loss and steps you trained on wavernn ?

tuong-olli commented 4 years ago

@tsungruihon The loss did not change in step 596k: | Epoch: 563/2276 (820/820) | Loss: 5.6460 | 0.7 steps/s | Step: 596k | I changed the learning rate to continue training. But it only can down to 5.4514. | Epoch: 571/1474 (820/820) | Loss: 5.4514 | 1.0 steps/s | Step: 1260k | And the synthesized audios had some noise.

OswaldoBornemann commented 4 years ago

@tuong-olli same as mine. If you have some free time, please visit this issue @ #177

1zxLi commented 4 years ago

After you try to change RAW, is LOSS better than before? look forward to your reply.

Houangnt commented 3 years ago

After you try to change RAW, is LOSS better than before? look forward to your reply.

What code did you change to use the RAW format? tks