Closed chazo1994 closed 2 years ago
I think you should train the model on more steps. In my experiments, the mel loss decreased to 0.22~0.23.
I think you should train the model on more steps. In my experiments, the mel loss decreased to 0.22~0.23.
Thanks, I will continue training to 400k step and show the results. But I still confused why I use same dataset, default parameter and train to 100k step like step in pretrained model but loss and quality is worse.
I Found that, the problem come from Montreal Forced Aligner version 2.0.0a22 or newer version, which do not put “sp” or “sil” in the phone tier. To fix this, just add "--disable_textgrid_cleanup" flag during alignment step.
Oh. Thanks for finding the problem! I should update the repo to match with libraries of the newer versions.
I have trained a stylespeech model use LibriTTS, but the quality was far worse than pretrain stylespeech model of author. I use default config and parameter and train the model within 100k step. The loss like bellow:
I also upload my audio sample of the text same as demo page in folder Train_LibriTTS_StyleSpeech in attached fille. There are always strange sounds at the end of each audio file, i can't explain that. meta_stylespeech_results.zip