NVIDIA / flowtron

Flowtron is an auto-regressive flow-based generative network for text to speech synthesis with control over speech variation and style transfer
https://nv-adlr.github.io/Flowtron
Apache License 2.0
887 stars 177 forks source link

Voice usually good, but occasional low mos #128

Open rui-lin opened 3 years ago

rui-lin commented 3 years ago

Hi thanks again for providing this repo.

I'm fine-tuning a model with ~15min of custom voice data, and the results are looking pretty good. Occasionally (10-20%?) of sentences would have suddenly much lower MoS though.

Config: 1e-4, ctc_loss_weight 0.01, using use_attn_prior:true. Surprisingly I haven't had to train w/ use_attn_prior:false to get some good sounding audio, although the voice doesn't sound that much like my custom voice.

Any suggestions based on the graphs to fix the occasional low mos? Am I interpreting the attention plots correctly that sometimes my attention is broken/I need the ctc loss to go closer to zero?

image

image

image