anhnh2002 / XTTSv2-Finetuning-for-New-Languages

60 stars 17 forks source link

For 100 hours , Do I need one speaker? #19

Closed desis123 closed 1 week ago

desis123 commented 1 week ago

I have found that a good model requires approximately 100 hours of audio data for training. My question is, does this 100-hour requirement need to consist of a single speaker's voice, or can it include multiple speakers to reach the total duration?

anhnh2002 commented 1 week ago

The 100-hour audio requirement does not need to come from a single speaker. However, if your training data contains audio from specific speakers, the model will perform better when cloning those speakers' voices compared to speakers who were not present in the training dataset.