Closed furqan4545 closed 7 months ago
this would be great... like in tts_v2 it would be great to use 2 types of speaker_wavs 1 for tone and voiceprint, and another for cadence, speed and emotion. This will allow to ua a voice to "dub" another. Something similar to this: https://www.respeecher.com/
I assume you mean speech to speech models like voice conversion. Correct?
@erogol yes, but speech to speech model should maintain the underlying emotions of the original audio during conversion.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discussion channels.
π Feature Description Hi is there any plan to release expressive TTS like SeamlessExpressive model from Meta? The model which can keep the underlying tonality, breaks, and expressions of the speaker in the generated output. I think it will be great if you can build any model like that or atleast provide some training code which we can use to train our own model.