Add support for additional TTS integrations through non-Microsoft focused SpeechService interface

druggedhippo commented 2 years ago

EDDI currently uses whatever built in Windows TTS system is installed. Unfortunately, the built in Windows TTS are not particularly good.

This feature request is to ask for a better more modular SpeechService class that allows other speech engines to "plugin" that do not rely on the Windows TTS interfaces and provide the same WAV stream as the existing class uses.

Examples of other engines could include (but are not limited to):

Amazon Polly - https://ai-service-demos.go-aws.com/polly
Google - https://cloud.google.com/text-to-speech
Microsoft Azure - https://azure.microsoft.com/en-us/services/cognitive-services/text-to-speech/
Different versions of the SAPI interface

As a proof of concept, here is an Amazon polly implementation I created.

https://gist.github.com/druggedhippo/0a887973ee019dea1fc9e522f513b0f5

Example audio of Amazon Polly processing a EDDI TTS prompt in real-time:

https://imgur.com/zyoWmQg

Tkael commented 2 years ago

Thank you for this. 😀

As you have effectively demonstrated, it is indeed possible to add additional speech synthesizers to EDDI, including for voices sourced from various cloud development environments (Azure, AWS, etc.).

These cloud voices typically require the user to provide specific credentials and are limited in some way (either as timed trials or offering to render a limited number of words for free each month).

We're happy to support additional voices in EDDI but it is also important to note that voices from different sources do not always behave alike (in terms of SSML support, lexicons, etc).

We would need to do some additional work to document the new capability and help users enter their credentials for accessing the voice. Some UI changes to allow capturing credentials in EDDI would probably also be very welcome.

Tkael commented 2 years ago

Tkael commented 1 year ago

https://cloud.google.com/text-to-speech/docs/libraries

EDCD / EDDI

Add support for additional TTS integrations through non-Microsoft focused SpeechService interface #2379