the amount of data required for a successful finetuning

in the readme of the train folder it says that we can try to start with a small data set (50 utterances), I prepared 50 minutes and the result was bad. The training lasted for 2200 epochs, and as a result of the first 100 epochs the sound of the model gradually became closer to the target voice(but not enough), but during the last 2100 epochs the sound got corrupted and became robotic....

as far as I understand the data was insufficient, how much data would you recommend?

alphacep / vosk-tts

the amount of data required for a successful finetuning #30