descriptinc / melgan-neurips

GAN-based Mel-Spectrogram Inversion Network for Text-to-Speech Synthesis
MIT License
964 stars 214 forks source link

Training time and how to start? #7

Open omiano opened 4 years ago

omiano commented 4 years ago

Hi, I'm interested in using your code but have a few questions. I saw you trained the model on 15 hours of audio from each speaker, how long did it take to train? Also your ReadMe has some commands for preparing the audio part of the dataset, how should we prepare the matching transcriptions of the audio? It would be really nice if there was some sort of "Quick Start" guide that shows exactly what commands to run and in what order, kind of like what is done here. Thank you so much!

jeewenjie commented 4 years ago

This is a vocoder so there is no need to prepare transcriptions. More information can be found in their paper.