NVIDIA / mellotron

Mellotron: a multispeaker voice synthesis model based on Tacotron 2 GST that can make a voice emote and sing without emotive or singing training data
BSD 3-Clause "New" or "Revised" License
853 stars 187 forks source link

Is there a way to extract character durations? #95

Closed jmasterx closed 3 years ago

jmasterx commented 3 years ago

Hi

I would like to use a Mellotron model to train fast pitch. Is it possible to use the model to extract durations in spectrogram frames for each phoneme since under the hood it's just a multispeaker tacotron?

Thanks