performance - Githubissues

adelacvg / ttts

Train the next generation of TTS systems.

Mozilla Public License 2.0

159 stars 17 forks source link

performance #19

Open yiwei0730 opened 1 month ago

yiwei0730 commented 1 month ago

I want to ask a few questions

What data does your latest Chinese, English, Japanese and Korean demo model use and how long is the data as a training set?
The demo audio file seems to have some slight background noise. Can I reuse your ckpt to continue training to achieve better intensity?
I would like to ask about the zero-shot effect of this model and whether it is suitable for finetune with little data.

adelacvg commented 1 month ago

I use the open-source Genshin Impact dataset, which conveniently includes data in four languages.
Noise might be difficult to entirely avoid, but I believe continuing training is feasible.
I think the zero-shot capability is quite good in terms of timbre, but due to the limited training data, the prosody similarity is still not ideal. Hence, fine-tuning with a small amount of data might not yield good results.

yiwei0730 commented 1 month ago

The third question is to ask For Finetune, I used a little bit of data within 2 minutes to adapt. Can the similarity and naturalness be achieved just for the person? I haved done this in NS2, but the similarity is soso, and the natural still have some noise.