NVIDIA / DeepLearningExamples

State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.
13.48k stars 3.21k forks source link

[FastPitch/PyTorch/Tacotron] I'd like to train some models. Where do I begin, how do I start? #847

Closed StElysse closed 3 years ago

StElysse commented 3 years ago

(I am not a computer programmer and am just scratching the surface of machine learning in trying to make my creative project work. Any help would be greatly appreciated.)

For the past three weeks, I have been training TTS synthesizer models on Windows with tacotron with this repo. I'm trying to synthesize voices for a creative project, but that repo is not currently able to effectively control the pitch of the speech I want to generate. I discovered xVASynth, and then FastPitch, and I'd like to see what is possible for my project with FastPitch.

Installing the above-mentioned repo was only possible with the help of a blog tutorial and patient error-searching; what is the process for installing, and then training FastPitch models on Windows? I would like to train those models and then drop them into the GUI/program crafted by Dan Ruta so that I can generate audio whose intonation I can control. I already have some syntheiser (not vocoder, not encoder) tacotron models trained with the Real-Time-Voice-Cloning repo; is it possible to use them with FastPitch? How can I use my old LibriTTS datasets if I must tune new models?

Thank you all so much.

alancucki commented 3 years ago

Hi @StElysse ,

what is the process for installing, and then training FastPitch models on Windows?

Unfortunately we do not provide OS-specific support. The easiest is to use Ubuntu. On Windows, you need to set up GPU drivers, CUDA toolkit, Python and all necessary packages. You can go the Docker route, or set all things separately. Have a look at a recent discussion https://github.com/NVIDIA/DeepLearningExamples/issues/831.

I already have some syntheiser (not vocoder, not encoder) tacotron models trained with the Real-Time-Voice-Cloning repo; is it possible to use them with FastPitch?

Yes, you'll need that for duration extraction. Try to adapt extract_mels.py to use your 3rd party Tacotron2 code.

How can I use my old LibriTTS datasets if I must tune new models?

What do you mean by "tune new models"? Have you already trained Tacotron2 on LibriTTS?

StElysse commented 3 years ago

What do you mean by "tune new models"? Have you already trained Tacotron2 on LibriTTS?

Yes, I have trained a number of synthesizer models on LibriTTS datasets with Tacotron2 over the past few weeks with CorentinJ's repo that I mentioned in my original post.

alancucki commented 3 years ago

Great, in that case you'd need to adapt prepare_dataset.sh to your models. Each Tacotron2 model should be able to align properly the dataset on which it has been trained on.

If the repository which you have used for Tacotron2 is not compatible, don't gen discouraged - I'd modify that foreign code, so it would save alignment matrices during training, or better yet to save durations using this code. Write durations to the disk and load directly in extract_mels.py without any dependence on Tacotron2. This would require some minor modifications to extract_mels.py.