NVIDIA / mellotron

Mellotron: a multispeaker voice synthesis model based on Tacotron 2 GST that can make a voice emote and sing without emotive or singing training data
BSD 3-Clause "New" or "Revised" License
854 stars 184 forks source link

Training on EmovDB #105

Open kngan43 opened 3 years ago

kngan43 commented 3 years ago

I am training mellotron on emovDB and have trimmed off leading + trailing silences. Below is an image of the alignment after 36000 iterations and it doesn't look like it's aligning properly but the model has not converged yet. Would this be an early sign that there is issue with the training? Is there any advice for how to get it to align properly?

I also noticed that some audio starts and end with noises like laughter. Would manually trimming these help?

image

CookiePPP commented 3 years ago

You should definitely check your data, that alignment is unlikely to converge properly.

rafaelvalle commented 3 years ago

trimming laughter and such vocalizations will certainly help.

On Wed, Aug 4, 2021 at 10:34 PM Cookie @.***> wrote:

You should definitely check your data, that alignment is unlikely to converge properly.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/NVIDIA/mellotron/issues/105#issuecomment-893177553, or unsubscribe https://github.com/notifications/unsubscribe-auth/AARSFD6Q63SJFAHIH6YFKPTT3IPF5ANCNFSM5BSPSYMA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .