NVIDIA / mellotron

Mellotron: a multispeaker voice synthesis model based on Tacotron 2 GST that can make a voice emote and sing without emotive or singing training data
BSD 3-Clause "New" or "Revised" License
853 stars 187 forks source link

Two key points of training multispeaker mellotron #112

Closed WelkinYang closed 2 years ago

WelkinYang commented 2 years ago

When I discovered that two years after Mellotron was published, there were still people who could not successfully train multi-speaker Mellotron, I realized that I had to tell you the two most important points 1、trimming front and end silence of the training data 2、increase the dropout rate of attention and decoder to 0.2 or more

when your setting is right, the alignment will appear with only 20,000 steps

image above is multispeaker mandarin mellotron

22

31