NVIDIA / mellotron

Mellotron: a multispeaker voice synthesis model based on Tacotron 2 GST that can make a voice emote and sing without emotive or singing training data
BSD 3-Clause "New" or "Revised" License
854 stars 187 forks source link

Question about custom dataset #60

Closed LucasRotsen closed 4 years ago

LucasRotsen commented 4 years ago

Hi everyone!

Firstly, thank you for the great implementation.

I haven't understood yet how should I prepare my data for training, so I'd appreciate if someone clarifies that for me. My assumptions are:

Are my assumptions correct?

rafaelvalle commented 4 years ago

Yes, that's a good start! Make sure you trim silences at the beginning and end of each of the audio files and the transcript matches the audio file.

LucasRotsen commented 4 years ago

Thanks for the quick reply, @rafaelvalle !