NVIDIA / flowtron

Flowtron is an auto-regressive flow-based generative network for text to speech synthesis with control over speech variation and style transfer
https://nv-adlr.github.io/Flowtron
Apache License 2.0
887 stars 177 forks source link

Higher number of frames inference issue #44

Closed artemg closed 4 years ago

artemg commented 4 years ago

When doing inference on provided model with longer texts and setting higher number of frames causes some strange effects, like looping the same words several times, while last ~6seconds are correct.

python3 inference.py -c config.json -f flowtron_ljs.pt -w waveglow_256channels_universal_v5.pt -t "Invertible models like Flowtron can be easier to train, because they can learn the distribution of the real-world training data directly. As a result, the flow-based approach to text-to-spectrogram generation provides more realism and more expressivity than current state-of-the-art speech synthesis models. Flowtron achieves this by giving users control over non-textual characteristics, enabling them to make a monotonic speaker sound expressive." -i 0 -n 2000

attnlayer0: sid0_sigma0 5_attnlayer0 attnlayer1: sid0_sigma0 5_attnlayer1

rafaelvalle commented 4 years ago

Such attention issues are fixed in Flowtron Parallel, we'll make it public soon. https://twitter.com/RafaelValleArt/status/1281268833504751616?s=20

For now you can split at punctuation.

artemg commented 4 years ago

Thanks for such fast reply! Will take a look at Flowtron Parallel when released.

rui-lin commented 3 years ago

Any update on flowtron parallel?

rafaelvalle commented 3 years ago

https://openreview.net/pdf?id=0NQwnnwAORi