NVIDIA / mellotron

Mellotron: a multispeaker voice synthesis model based on Tacotron 2 GST that can make a voice emote and sing without emotive or singing training data
BSD 3-Clause "New" or "Revised" License
855 stars 183 forks source link

inference speed on CPU #109

Open Adibian opened 2 years ago

Adibian commented 2 years ago

Hi. I am exploring about speed of training and inference different multi speaker TTS models on single CPU or on singe GPU. Thanks for any explanation in this case for current model or any other models of multi speaker TTS.