NVIDIA / mellotron

Mellotron: a multispeaker voice synthesis model based on Tacotron 2 GST that can make a voice emote and sing without emotive or singing training data
BSD 3-Clause "New" or "Revised" License
853 stars 187 forks source link

Training on a different language #102

Open OSSome01 opened 2 years ago

OSSome01 commented 2 years ago

First of all thanks for the wonderful work guys! I had a couple of doubts regarding training mellotron on languages other than English. I am new to this domain so forgive me if these are stupid questions

  1. Can I train mellotron on the "Hindi" language? It is based on the "Devnagari" script.
  2. The research paper mentions that we requires text, audio files and speaker ids. I have a dataset available, but it is in the devnagari script, so I was doubtful it would work out of the box. I think I'll have to transliterate the input text. Is there any other way to do it?
  3. Also for inference how will I be able to get the rhythm of the source.

Thanks!!

OSSome01 commented 2 years ago

Also approximately how much audio data is required to get good results?