JarodMica / ai-voice-cloning

GNU General Public License v3.0
656 stars 144 forks source link

Quirky Issues Training a Chinese Tortoise-TTS Model #90

Open Dinxin opened 7 months ago

Dinxin commented 7 months ago

Hello. I recently trained a tortoise-tts model using 300 hours of Chinese speech data, with an average duration of 13 seconds per sample. However, I encountered a very peculiar issue: the model's mel/text loss decreases normally on the training set, but sharply increases on the validation set. It seems that the model is overfitting.

Here are my parameter settings:

The loss curves I observed are: image

When I reduced the learning rate to 5e-5, the problem seemed to be alleviated somewhat, but the trained model still lacks generalization ability on the validation set. image

As for the Chinese speech corpus, I used the G2Pw-pinyin module contained in Bert-VITS2 repo (Extra-Fix branch) to convert the Chinese characters into corresponding pinyin phonemes. Some data points look like: image