coqui-ai / TTS

πŸΈπŸ’¬ - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
http://coqui.ai
Mozilla Public License 2.0
33.41k stars 4.06k forks source link

Is there any plan to release expressive TTS like SeamlessExpressive model from Meta? #3363

Closed furqan4545 closed 7 months ago

furqan4545 commented 9 months ago

πŸš€ Feature Description Hi is there any plan to release expressive TTS like SeamlessExpressive model from Meta? The model which can keep the underlying tonality, breaks, and expressions of the speaker in the generated output. I think it will be great if you can build any model like that or atleast provide some training code which we can use to train our own model.

Zibri commented 9 months ago

this would be great... like in tts_v2 it would be great to use 2 types of speaker_wavs 1 for tone and voiceprint, and another for cadence, speed and emotion. This will allow to ua a voice to "dub" another. Something similar to this: https://www.respeecher.com/

erogol commented 9 months ago

I assume you mean speech to speech models like voice conversion. Correct?

furqan4545 commented 9 months ago

@erogol yes, but speech to speech model should maintain the underlying emotions of the original audio during conversion.

stale[bot] commented 8 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discussion channels.