begeekmyfriend / tacotron2

Forked from NVIDIA/tacotron2 and merged with Rayhane-mamah/Tacotron-2
BSD 3-Clause "New" or "Revised" License
81 stars 38 forks source link

Question about WaveGlow #35

Closed leijue222 closed 3 years ago

leijue222 commented 3 years ago

I have trained WaveRNN and got good results. Since WaveGlow's inference speed on GPU is faster than WaveRNN, so I want to train a WaveGlow vocoder.

For your WaveGlow repository, I made the following modifications to make training possible:

  1. Move the quant and mel data to it; (Data used to train WaveRNN)
  2. Change config.json mel_pad_val: -5.0; (I set voc_pad_val=-5.0 when training WaveRNN)
  3. Add z = z.type(torch.cuda.HalfTensor) before glow.py-L116 to solve the training error of input type (torch.cuda.ShortTensor) and weight type (torch.cuda.HalfTensor) should be the same.

After 13 hours of training, 21 epochs, logs is: 屏幕截图 2021-01-13 11:04:40

When using --is_fp16 to inference, the wav result is all silent. If not using --is_fp16 to inference, the wav result is all noise.

Did I do something wrong? Could you give me some suggestions?


Maybe the training is not enough. Now 24 hours, 24K steps, 40 epochs. Sometimes still will have warning of Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1024.0

begeekmyfriend commented 3 years ago

The warning is just about gradient overflow and the program would clip the gradient under 1.0 threshold. Do not care about it since it only occurs at the beginning of the training. Since I am not engage in TTS now I am not familiar with this project. I think you'd better remove z = z.type(torch.cuda.HalfTensor) for comparison to train out a baseline result first. Good luck.