How much improvement in training quality can I expect from using a 3-hour audio dataset compared to a 30-minute one?

RVC-Boss / GPT-SoVITS

1 min voice data can also be used to train a good TTS model! (few shot voice cloning)

MIT License

36.26k stars 4.14k forks source link

How much improvement in training quality can I expect from using a 3-hour audio dataset compared to a 30-minute one? #1816

Open Nisekoi-1 opened 2 days ago

Nisekoi-1 commented 2 days ago

Is the improvement in training quality worth the extra time required?

Chi8wah commented 2 days ago

Due to this video, it's said that dataset longer than 30 minutes will begin to show boundary effects, but based on my own practice, I personally recommend the longer the better, even though it may be a bit difficult to distinguish the difference between the two.