alphacep / vosk-tts

Text To Speech Synthesis with Vosk
Apache License 2.0
131 stars 18 forks source link

the amount of data required for a successful finetuning #30

Open ChernovSO opened 1 month ago

ChernovSO commented 1 month ago

in the readme of the train folder it says that we can try to start with a small data set (50 utterances), I prepared 50 minutes and the result was bad. The training lasted for 2200 epochs, and as a result of the first 100 epochs the sound of the model gradually became closer to the target voice(but not enough), but during the last 2100 epochs the sound got corrupted and became robotic....

as far as I understand the data was insufficient, how much data would you recommend?

nshmyrev commented 1 month ago

Share the audio examples (original file example and synthesized file too)