NVIDIA / mellotron

Mellotron: a multispeaker voice synthesis model based on Tacotron 2 GST that can make a voice emote and sing without emotive or singing training data
BSD 3-Clause "New" or "Revised" License
853 stars 187 forks source link

Voice synthesis by model is not the same as the voice with speaker ID #107

Open tuanh123789 opened 2 years ago

tuanh123789 commented 2 years ago

I trained model on my Vietnamese dataset ( 46 speakers ). But when inference my output voice is not the same as the voice with speaker ID. Can you explain more detail to solve this problem. Thank you!