Open mathigatti opened 4 years ago
Hi @rafaelvalle can you please answer to the questions on this issue? I'm having similar problems and can't achieve the quality as on the papers.
Let us know if you have specific issues.
@rafaelvalle , I trained a new speaker with 17mins of speaking data. After 9k iterations it generated a good alignment then used the best alignment checkpoint to test speaking style transfer. The style transfer was 100% perfect and can understand the words spoken by the trained speaker but the voice was a little croaky . The audio recordings of the trained speaker was not croaky voice.
Which of the training params in hparams.py can I tune to get rid of the croaky voice so the voice is much smoother like the trained speaker voice.
Shall I also adjust the F0_min to higher or lower? any other params to adjust?
I'm also thinking maybe I should increase the voice data from 17mins to 25mins and and re-train the speaker?
Hi, thanks for this amazing project! I wanted to ask a few short questions.
I want to train the model over a new voice, the dataset is similar to the LJ Speech Dataset with short audios of a single person (a man in this case) speaking english with a duration of between 1 and 10 seconds on each sample (Those are about 6 hours in total). I plan to use the pretrained LibriTTS model you provide as a start point.