Tacotron lost some words when inference ?

NVIDIA / tacotron2

Tacotron 2 - PyTorch implementation with faster-than-realtime inference

BSD 3-Clause "New" or "Revised" License

5.07k stars 1.38k forks source link

Tacotron lost some words when inference ? #452

Open leminhnguyen opened 3 years ago

leminhnguyen commented 3 years ago

I've trained my own model in a few days ago. And today, I start testing the model with the input "I love fruits like apple, banana, watermelon,..."
For the first try: I've the audio with complete the input sentence
But for the second: I've lost the word banana in the predicted audio
I tried several times but the word banana sometimes, sometimes not in the predicted audio. So I guess I have some troubles with the silence which due to the unstable result.

Any suggestions? Thank in advance.

EuphoriaCelestial commented 3 years ago

I am facing the same error btw, which language are you training?

leminhnguyen commented 3 years ago

I am facing the same error btw, which language are you training?

I trained for VietNamese, I think the reason behind that Tacontron was Autogressive model so:

Synthesized speech is usually not robust. Due to error propagation [3] and the wrong attention alignments between text and speech in the autoregressive generation, the generated mel-spectrogram is usually deficient with the problem of words skipping and repeating [19]

You can read more in the FastSpeech paper from Microsoft (https://arxiv.org/pdf/1905.09263.pdf)