Plachtaa / VITS-fast-fine-tuning

This repo is a pipeline of VITS finetuning for fast speaker adaptation TTS, and many-to-many voice conversion
Apache License 2.0
4.69k stars 704 forks source link

New speakers replace old ones or supplement the model speaker pool? #266

Open NikitaKononov opened 1 year ago

NikitaKononov commented 1 year ago

Hello, thank you for you great work

I have a question. I have not understand that clearly What happens to speaker pool, when we finetune a multi speaker model that has, for example, 100 speakers With dataset of 5 new speakers

As I understand, model has fixed capacity of 100 speakers in the config So, new speakers replace some old ones? Or model somehow gets ability to synthesize 105 speakers instead of 100?

Looking forward for your answer. Thanks

Plachtaa commented 1 year ago

There is no actual capacity of speaker number. Once you start fine-tuning, all old speakers will be cleared and your designated speakers will be added in instead. You can choose to add as many speakers as you like.

NikitaKononov commented 1 year ago

Once you start fine-tuning, all old speakers will be cleared and your designated speakers will be added in instead.

Thank you for your answer

So, there's no way to keep old speakers in models and add new ones? That's kinda sad

Plachtaa commented 1 year ago

It is possible but you need at least 2~3 audios per speaker for those old speakers or else the model tends to gradually forget the voices of old speakers. Hence, I'm sorry that it is not considered in this projectšŸ˜„.

NikitaKononov commented 1 year ago

It is possible but you need at least 2~3 audios per speaker for those old speakers or else the model tends to gradually forget the voices of old speakers. Hence, I'm sorry that it is not considered in this projectšŸ˜„.

So the main idea is smth like that?

And it has success, because our base checkpoint has good generalization ability and adopts well for new voices?

Plachtaa commented 1 year ago

Yes, it is exactly how it works.

NikitaKononov commented 1 year ago

Yes, it is exactly how it works.

Thank you for your time, you've kindly answered all my questions Have a nice day