NVIDIA / mellotron

Mellotron: a multispeaker voice synthesis model based on Tacotron 2 GST that can make a voice emote and sing without emotive or singing training data
BSD 3-Clause "New" or "Revised" License
853 stars 187 forks source link

rhythm not good so the inference audio output is weird? #88

Open ryanjfdeng1 opened 3 years ago

ryanjfdeng1 commented 3 years ago

I record a part of the song and write lyrics to the examples_filelist.txt, such as: data/hh.wav|Hal le lu jah Hal le lu jah Hal le lu jah Hal le lu jah|1

but the output is not good. how can i improve it? I think the rhythm from "mel_outputs, mel_outputs_postnet, gate_outputs, rhythm = mellotron.forward(x)" is not accurate. Can i use musicxml to correct the rhythm? and How to do it.