NVIDIA / flowtron

Flowtron is an auto-regressive flow-based generative network for text to speech synthesis with control over speech variation and style transfer
https://nv-adlr.github.io/Flowtron
Apache License 2.0
887 stars 177 forks source link

Training loss becomes nan when the number of speakers changes #130

Open Alexey322 opened 2 years ago

Alexey322 commented 2 years ago

Hi. I trained the flowtron on two speakers, for a total of 50 hours, 25 for each. After that, I wanted to train the model for 10 speakers for 20-30 minutes each using the basic checkpoint of the model trained for 50 hours. I changed the number of speakers from 2 to 10 in the config and loaded all weights except for speaker embedding. At first, the error jumped from 10 to 10,000, and then it completely became nan. Probably the problem is that the model adjusts too much to the old speaker embedding, and when it changes, it cannot adapt to the accidentally initialized speaker embedding. @rafaelvalle, please tell me if you have encountered such a problem?

v-nhandt21 commented 2 years ago

Me too,

I train with only one speaker, it converged

Then try to adapt for 65 speakers, 15 min each speaker

I follow this

image

Finally I got Nan

image

The question is that I do it right or not, I also try to fine tune as instruction but it reports speaker embedding miss match.

Thank you @rafaelvalle

letrongan commented 2 years ago

Have you fixed yet? @v-nhandt21

v-nhandt21 commented 2 years ago

Have you fixed yet? @v-nhandt21

No, but I found that we can change the number of speaker from the pretrain, we just can replace the embedding layer of one of exist speaker