Hello,
I've 2 questions
1- I've a relatively big amount data, 80+ hours... the voice I synthesis is still metallic and minimum avg loss i reached was 1.01 after 200k steps, but I'm not sure which hparams I could tweak to further improve quality. while I realized in older issues discussed in the repo here that many have reached avg loss <0.6 in fewer number of steps
2- when training wavenet, the predicted wavs generated in logs-Tacotron-2/wavs/ are too short -could hardly complete a word-, doesn't that affect the quality of the training?, and if it does how to lengthen them
I've replaced LJspeech with my data and adjusted the params a little because the voice of my data is a man
here is my hparams file:
hparams.txt
Hello, I've 2 questions 1- I've a relatively big amount data, 80+ hours... the voice I synthesis is still metallic and minimum avg loss i reached was 1.01 after 200k steps, but I'm not sure which hparams I could tweak to further improve quality. while I realized in older issues discussed in the repo here that many have reached avg loss <0.6 in fewer number of steps
2- when training wavenet, the predicted wavs generated in logs-Tacotron-2/wavs/ are too short -could hardly complete a word-, doesn't that affect the quality of the training?, and if it does how to lengthen them
I've replaced LJspeech with my data and adjusted the params a little because the voice of my data is a man here is my hparams file: hparams.txt