Generic Text-to-Speech Inference

NVIDIA / mellotron

Mellotron: a multispeaker voice synthesis model based on Tacotron 2 GST that can make a voice emote and sing without emotive or singing training data

BSD 3-Clause "New" or "Revised" License

854 stars 187 forks source link

Generic Text-to-Speech Inference #57

Open GreenGarnets opened 4 years ago

GreenGarnets commented 4 years ago

I understood that Mellotron puts audio or musicXML on the result of synthesis based on Tacotron2 and gives StyleTransfer accordingly. By the way, if there is no reference file here, can't I just bring the general TTS composite result? I looked at the code section of model.py, but I'm asking because I don't think it's relevant.

rafaelvalle commented 4 years ago

Please re-phrase your question as it is not clear to me what your question is.