How to train a new speaker?

tangfucius commented 2 months ago

Hi! I am from HK and just started learning about Cantonese TTS. My first goal is to train it on 林尚義's voice. I am starting with this repo as suggested in this repo's README, but that repo doesn't have a section for issues, so I am asking here instead.

I managed to get inference working in webui.py by following the instructions in this PR, and now I want to train a new speaker. I have a few questions:

Is the process to train a new speaker the one described in webui_preprocess.py? i.e. I need to prepare a esd.list with the following format: ****.wav|{说话人名}|{语言 ID}|{标签文本} Audio is easy to find, but is the annotated text needed too? Preparing the text would be quite time consuming if there is a lot of audio data - are there utils that can help with that?
Based on your comment, should we work on the Style-Bert-VITS2 branch instead? I can also contribute to better docs if I get things going.
Does the framework support one-shot voice cloning, as claimed in cantonese.ai (unfortunately the web demo isn't available)? I assume not, but would like to confirm.

Thanks for your work in making Cantonese TTS open source! Hope I can contribute to this initiative going forward!

Naozumi520 commented 2 months ago

Yes, it should follow this format and annotations are required. You can use speech-to-text to save time, but it won't be as accurate as human transcribe

https://huggingface.co/alvanlii/whisper-small-cantonese

No, this is just an experiment and also to see if it solves other users' problems, you should continue to use our hon9kon9nizer repository as it is newer and more promising.
No... it doesn't support one-time voice cloning, the example you provided is from a different person. Company (? And it's paid and doesn't seem to be open source.

tangfucius commented 2 months ago

Thanks for the link! Will try that out to save time.
This seems to be the person behind cantonese.ai, and the TTS samples he provided sounded pretty decent, but he never released his models. Just wondering if you are aware of his work.

Naozumi520 / Bert-VITS2-Cantonese-Yue

How to train a new speaker? #6