Voice cloning attempts - Githubissues

Hi @Tomiinek First of all happy new year! I was able to replicate your generate_switching model, with the ability to have the main german speaker 00-de read french and the main french speaker 00-fr read german. (same as on your demo page). Works pretty good.

Now, I am trying to add another speaker, say 99-fr with "not so many examples" (few hundreds). I am able to make 99-fr speak french quite good but in other languages it does not work.

I also tried with mailabs and did not get better results.

Before further testing, I would like to better understand the impact of language embedding set to zero (in generate switching) in other words what if set it to something else, and also what could be the impact of the speaker embedding dimension (currently 32) in other words what if we set it to 128 for instance.

Altogether I am wondering if having two dominant speakers for a given language can confuse the model.

Tomiinek / Multilingual_Text_to_Speech

Voice cloning attempts #70