NVIDIA / mellotron

Mellotron: a multispeaker voice synthesis model based on Tacotron 2 GST that can make a voice emote and sing without emotive or singing training data
BSD 3-Clause "New" or "Revised" License
854 stars 187 forks source link

inference_noattention for new sequences #34

Open texpomru13 opened 4 years ago

texpomru13 commented 4 years ago

can I somehow use inference_noattention for new sequences, and not just for true ones? or is only the "inference" method suitable for this?

if so, how do you get the right rhythm for the new sequence to copy the style of the selected audio?

rafaelvalle commented 4 years ago

inference_noattention requires passing the rhythm (alignment map) as input. This means you either need a pre-existing audio from which you'll extract the alignment map or you need to create a rhythm (alignment map) by hand.

anderleich commented 3 years ago

Inference is done with rhythm and pitch transfer. Is it possible to apply just the rhythm of a new own audio? I don't want to use the pitch_contour variable

Moreover, is it possible to apply a certain rhythm extracted from an audio to random text?? Or is it just to apply the rhythm to the text for which we already have an audio?

tuanh123789 commented 2 years ago

Inference is done with rhythm and pitch transfer. Is it possible to apply just the rhythm of a new own audio? I don't want to use the pitch_contour variable

Moreover, is it possible to apply a certain rhythm extracted from an audio to random text?? Or is it just to apply the rhythm to the text for which we already have an audio?

Can you tell me is it possible to apply a certain rhythm extracted from an audio to random text when you try to inference ?. Thank you so much