Voice usually good, but occasional low mos

Hi thanks again for providing this repo.

I'm fine-tuning a model with ~15min of custom voice data, and the results are looking pretty good. Occasionally (10-20%?) of sentences would have suddenly much lower MoS though.

Config: 1e-4, ctc_loss_weight 0.01, using use_attn_prior:true. Surprisingly I haven't had to train w/ use_attn_prior:false to get some good sounding audio, although the voice doesn't sound that much like my custom voice.

Any suggestions based on the graphs to fix the occasional low mos? Am I interpreting the attention plots correctly that sometimes my attention is broken/I need the ctc loss to go closer to zero?

NVIDIA / flowtron

Voice usually good, but occasional low mos #128