Open leminhnguyen opened 3 years ago
I am facing the same error btw, which language are you training?
I am facing the same error btw, which language are you training?
Synthesized speech is usually not robust. Due to error propagation [3] and the wrong attention alignments between text and speech in the autoregressive generation, the generated mel-spectrogram is usually deficient with the problem of words skipping and repeating [19]
banana
in the predicted audiobanana
sometimes, sometimes not in the predicted audio. So I guess I have some troubles with the silence which due to the unstable result.Any suggestions? Thank in advance.