as-ideas / TransformerTTS

🤖💬 Transformer TTS: Implementation of a non-autoregressive Transformer based neural network for text to speech.
https://as-ideas.github.io/TransformerTTS/
Other
1.13k stars 227 forks source link

Get rid of the "robotic" sound #121

Closed alexvwegen closed 2 years ago

alexvwegen commented 3 years ago

I successfully trained a German model with a relatively small dataset (ca. 7000 sentences), I attached an example from 90k steps. Compared to my results from other TTS-repos I consider this already pretty decent, but I was wondering what can be done about the "robotic flange" in the voice - could this be related to the small size of my dataset? I tried to mess with FFT-params but this did no good...

Would be nice if someone has an idea / tip for me :-)

custom_text_mb_493be63_90000.zip

egerong commented 2 years ago

The main branch uses Griffin-Lim for vocoding. Take a look at the vocoding branch to use MelGAN or HiFiGAN for more natural results

king-dahmanus commented 2 years ago

Yes Griffin Lim is so crampy, you need to use Hi-Fi can or some other good vocoder, but not Griffin Lim.

alexvwegen commented 2 years ago

Thanks guys! ...well now that you name it the solution seems so obvious 😄 I guess that means I have to train a new model in vocoding branch as well as the HiFiGan with my dataset, right?

alexvwegen commented 2 years ago

Yep, that was it...trained HiFiGAN only about 25k steps now and I'm already smiling, guess I can close this.