Closed Simbaprince closed 2 weeks ago
Some of the things you mention are not necessarily emotions, but you can achieve different speaking styles by providing a reference audio of that style and speaker to the set_utterance_embedding
method of the inference interface.
Versions are cumulative, so the most recent version has the best quality yet.
How to get correct value of parameters of IMS_toucan for several emotional voices such as angry, sad, excited, cheerful, shouting, whispering, terrified, friendly, unfriendly, hopeful, normal.