Closed paarthneekhara closed 4 years ago
We used the default value 1.0 as wel.
Thanks @rafaelvalle ! After around 50k training steps, the alignment map looks like this. I did make a change in the implementation and removed the conditioning on pitch contours (f0s).
Is this normal? Do you recall by any chance how long it takes for alignment maps to appear like a diagonal line while training mellotron on LibriTTS?
Ah.. Just noticed this in the paper. Makes sense, closing this issue. "In our setup, we find it easier to first learn attention alignments on speakers with large amounts of data and then finetune to speakers with less data. Thus, we first train Mellotron on LJS and Sally and finetune it with a new speaker embedding on LibriTTS, starting with a learning rate of 5e-4 and annealing the learning rate as the loss starts to plateau."
I want to know what p_teacher_forcing was set as while training mellotron. I am using the default value 1.0 and I am not able to get proper alignment/attention map even after 100k steps. I was wondering if something else was used in training the LibriTTS model.