Tomiinek / Multilingual_Text_to_Speech

An implementation of Tacotron 2 that supports multilingual experiments with parameter-sharing, code-switching, and voice cloning.
MIT License
826 stars 157 forks source link

Voice cloning attempts #70

Closed vince62s closed 2 years ago

vince62s commented 2 years ago

Hi @Tomiinek First of all happy new year! I was able to replicate your generate_switching model, with the ability to have the main german speaker 00-de read french and the main french speaker 00-fr read german. (same as on your demo page). Works pretty good.

Now, I am trying to add another speaker, say 99-fr with "not so many examples" (few hundreds). I am able to make 99-fr speak french quite good but in other languages it does not work.

I also tried with mailabs and did not get better results.

Before further testing, I would like to better understand the impact of language embedding set to zero (in generate switching) in other words what if set it to something else, and also what could be the impact of the speaker embedding dimension (currently 32) in other words what if we set it to 128 for instance.

Altogether I am wondering if having two dominant speakers for a given language can confuse the model.

padmanabhankrishnamurthy commented 2 years ago

Hi,

Were you able to find out what the impact of the speaker embedding dimension was +/ successfully add 99-fr?

Thanks!