Closed cduguet closed 4 years ago
@cduguet Cristian, download the WaveGlow weights we used with Mellotron and let us know if it sounds better. WaveGlow weights
Thank you! I will try them out. Did you train it on Mellotron outputs or melspecs created from the audio files?
It was trained on studio quality audio files from a female speaker.
Closing due to inactivity.
Hello, I have been training my waveglow network from scratch for over 1000+ epochs (from a german dataset (LJSpeech), duration of 39 hours at 16KHz).
The quality still has some issues though. The voice sounds gargling and coarse. I tried denoising and controlling sigma, but not improving much. Here are inferring samples from a from-audio-generated mel spectrogram.
-You don't need to listen to all audios, just to ORIGINAL, BEST and TACOTRON. The other are auxiliaries in case of wondering how would tuning improve results.
[ORIGINAL]
[BEST] denoiser_strength=0.1, sigma=0.666
denoiser_strength=0.00, sigma=0.666
denoiser_strength=0.01, sigma=0.666
denoiser_strength=0.01, sigma=0.8:
denoiser_strength=0.01, sigma=0.4
Surprisingly enough, even though I trained with audiowaves, inference with the Tacotron-generated melspecs sounds better:
[TACOTRON] Generated Mel Spectrogram with denoiser_strength=0.01, sigma=0.666
The question is: Does someone have experience getting rid of this gargling in the voice? Does training further help? For me the training curve has flattened since around epoch 900 already with a loss of around -6.0.
Thank you for your suggestions!