Mary-TTS API compatibility

fquirin commented 1 year ago

Many screen-readers and programs like Mycroft, SEPIA, Home Assistant, etc. support the Mary-TTS HTTP API introduced many years ago to communicate with a TTS system. Coqui TTS would work out-of-the-box with all of these programs if the server could offer a compatible API. I've listed some endpoints here (old Javadocs here). The most important ones (all HTTP GET) are probably:

/locales - returns a list of supported locales in the format [locale]\n...
/voices - returns a list of supported voices in the format [name] [locale] [gender]\n.... 'name' can be anything without spaces(!), 'locale' is e.g. en_US or de_DE (probably this) and 'gender' is female/male but could be extended with other/custom/diverse or whatever
/process?INPUT_TEXT=..text..&INPUT_TYPE=TEXT&LOCALE=[locale]&VOICE=[name]&OUTPUT_TYPE=AUDIO&AUDIO=WAVE_FILE - Processes the text and returns a wav file. We can probably ignore INPUT_TYPE, OUTPUT_TYPE and AUDIO as I've never seen any program using a different setting.

There is an existing PR by @mhetzi that implemented a bit of the /process endpoint: https://github.com/coqui-ai/TTS/pull/2162 . We can build on that.

One issue to solve:

The Coqui TTS server usually starts with a specific model pre-loaded and does not support hot-swapping of voices. So the questions are:

Can we support the VOICE parameter of the Mary-TTS /process endpoint and set an individual voice for each request?
Or should we simply return the active voice for /voices, the active language for /locales and ignore the parameter for /process?

Any thoughts about that? 🙂

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discussion channels.

fquirin commented 1 year ago

No interest in Mary-TTS compatibility? I guess the most basic support where we just return the active voice (instead of all available voices) would still be better than nothing and easy to implement 🤔

mhetzi commented 1 year ago

Seems like there is no interest. I too got no comment on my PR.

thorstenMueller commented 1 year ago

I think MaryTTS compatibility would be a great enhancement for Coqui TTS.

erogol commented 1 year ago

No interest in Mary-TTS compatibility? I guess the most basic support where we just return the active voice (instead of all available voices) would still be better than nothing and easy to implement 🤔

Your PR is not passing tests

Though I have no idea about Mary-TTS. It'd be nicer to have a review from someone else familiar with it.

fquirin commented 1 year ago

Your PR is not passing tests

That was not my PR, but I'll try to make a new one, probably next month. The code in the PR can be reused for "basic" support. Any "advanced" support is probably not possible the way the server works (see concerns above).

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discussion channels.

fquirin commented 1 year ago

I've implemented basic support now 🙂: https://github.com/coqui-ai/TTS/pull/2352

erogol commented 1 year ago

PR went into dev. Thanks @fquirin for the PR.

coqui-ai / TTS

Mary-TTS API compatibility #2227