About merging small dataset with large datasets

godspirit00 commented 2 years ago

I read the post here about fine-tuning on small dataset, it has been very helpful. I have some questions: does the large dataset have to include many speakers? How much data should there be for each speaker? almost the same amount for each speaker? Specifically, I intend to fine-tune on a dataset of ~30 min, British male. I have the Nancy corpus ~ 16 hours American female, and a dataset I made out of an audiobook ~ 8 hours American female, a dataset of ~ 5 hours American male.
Are these enough to make a multi speaker dataset to fine-tune the small dataset? Do I still need to get LibriTTS? I don't care about the result accent but I need the result to sound clear and preferably expressive.

Thanks a lot!

TheHonestBob commented 2 years ago

In my experience（fs2+mb-melgan for mandarin）， you just need a large dataset as speaker1,other speaker just need 30min，if you need high performance, you should train male or female only.

godspirit00 commented 2 years ago

@TheHonestBob Thanks for the reply! That means all the speakers in the dataset should be of the same gender?

TheHonestBob commented 2 years ago

@TheHonestBob Thanks for the reply! That means all the speakers in the dataset should be of the same gender?

yes，if you have the same size of male and female dataset, you can try train male and female at the same time, In my experience, I use baker dataset and 30min male dataset, I can get good result, although male is not good, if you get good result, please share result.

TensorSpeech / TensorFlowTTS

About merging small dataset with large datasets #729