NVIDIA / mellotron

Mellotron: a multispeaker voice synthesis model based on Tacotron 2 GST that can make a voice emote and sing without emotive or singing training data
BSD 3-Clause "New" or "Revised" License
854 stars 187 forks source link

Inference failed #66

Open lqniunjunlper opened 4 years ago

lqniunjunlper commented 4 years ago

When use mellotron.inference(), these is a "Warning! Reached max decoder steps"; While user inference_noattention(), final audio is confused!

Train loss 6010 0.283912 Grad Norm 0.313008 2.66s/it

BTW, what is the most important input feature for mellotron to produce a good performance?

rafaelvalle commented 3 years ago

Please share alignment maps, training and validation losses, and samples.