audeering / shift

Contribution to https://shift-europe.eu
1 stars 0 forks source link

This seems amazing, but only english.... #5

Open juangea opened 1 month ago

juangea commented 1 month ago

Hi there.

StyleTTS and this seems amazing, but for some of us I think it's very hard or nearly impossible to train a new language, could be great to have some more languages or maybe I'm wrong and it's not that hard to train a new language.

I have one 4090, I never did a training on a new language, I could be happy to help and share a model, but I cannot have the 4090 working 7 days a week during several weeks because I use it for work.

Are there plans to add / train more languages?

dkounadis commented 1 month ago

Hallo,

The SHIFT TTS is based on a phenomenon that if one feeds 4x speed style (speaker/reference) to StyleTTS2 the latter becomes very emotional.

The figures below show how much the emotion probability increases by using 4x speed than 1x speed, i.e. natural speech styles.

fig_english_WIN=40_HOP=10_HFdisc

fig_foreign_WIN=40_HOP=10_HFdisc

In this repo we have pre-generated 134 interesting styles for StyleTTS2.

You can use those styles also in a non-English StyleTTS2 - e.g. this issue

juangea commented 1 month ago

Thanks @dkounadis .

I really hope you can train a multi-language model, it will be the most useful TTS out there, so far the best solution to get something emotional is Bark, and it’s not very much controlable and it’s a bit random.

No other open source can be compared regarding multilingual, and this could be awesome. :)