Training the model from scratch, pronunciation is unintelligible

yygg678 commented 2 weeks ago

Checks

[X] This template is only for question, not feature requests or bug reports.
[X] I have thoroughly reviewed the project documentation and read the related paper(s).
[X] I have searched for existing issues, including closed ones, no similar questions.
[X] I confirm that I am using English to submit this report in order to facilitate communication.

Question details

Using my own phone sequence, I trained the model from scratch, with about 200 hours of Chinese data and a 155M model. The synthesized speech is completely incomprehensible. How much data is generally needed to train a model from scratch?

SWivid commented 2 weeks ago

I have thoroughly reviewed the project documentation and read the related paper(s).

All details are given in our paper, including used training corpus for small model, batchsize, evaluation results from 400~800k updates. Train with same batchsize to approx. 200K updates will hear something intelligible.

SWivid commented 1 week ago

will close this issue, feel free to open if further questions

SWivid / F5-TTS

Training the model from scratch, pronunciation is unintelligible #413

Checks

Question details