NVIDIA / mellotron

Mellotron: a multispeaker voice synthesis model based on Tacotron 2 GST that can make a voice emote and sing without emotive or singing training data
BSD 3-Clause "New" or "Revised" License
854 stars 184 forks source link

Inference with our own text on pretrained model #78

Open xw1324832579 opened 4 years ago

xw1324832579 commented 4 years ago

Hello @robinsloan , thank you for sharing this amazing repo! During inference process, I change some words in hallelujah.musicxml and got something wrong. I tried words such as Try My Why Be Tree... and it seems that they can't be converted to their corresponding phonemes(Try->T R,My->M,Why->W,Be->B, Tree->T R).I found maybe there's something wrong after getting word_arpabet in the function events2eventsarpabet in mellotron_utils.py. Any ideas? Thank you.