Closed alexvwegen closed 2 years ago
The main branch uses Griffin-Lim for vocoding. Take a look at the vocoding branch to use MelGAN or HiFiGAN for more natural results
Yes Griffin Lim is so crampy, you need to use Hi-Fi can or some other good vocoder, but not Griffin Lim.
Thanks guys! ...well now that you name it the solution seems so obvious 😄 I guess that means I have to train a new model in vocoding branch as well as the HiFiGan with my dataset, right?
Yep, that was it...trained HiFiGAN only about 25k steps now and I'm already smiling, guess I can close this.
I successfully trained a German model with a relatively small dataset (ca. 7000 sentences), I attached an example from 90k steps. Compared to my results from other TTS-repos I consider this already pretty decent, but I was wondering what can be done about the "robotic flange" in the voice - could this be related to the small size of my dataset? I tried to mess with FFT-params but this did no good...
Would be nice if someone has an idea / tip for me :-)
custom_text_mb_493be63_90000.zip