Closed fquirin closed 1 year ago
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discussion channels.
No interest in Mary-TTS compatibility? I guess the most basic support where we just return the active voice (instead of all available voices) would still be better than nothing and easy to implement 🤔
Seems like there is no interest. I too got no comment on my PR.
I think MaryTTS compatibility would be a great enhancement for Coqui TTS.
No interest in Mary-TTS compatibility? I guess the most basic support where we just return the active voice (instead of all available voices) would still be better than nothing and easy to implement 🤔
Your PR is not passing tests
Though I have no idea about Mary-TTS. It'd be nicer to have a review from someone else familiar with it.
Your PR is not passing tests
That was not my PR, but I'll try to make a new one, probably next month. The code in the PR can be reused for "basic" support. Any "advanced" support is probably not possible the way the server works (see concerns above).
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discussion channels.
I've implemented basic support now 🙂: https://github.com/coqui-ai/TTS/pull/2352
PR went into dev. Thanks @fquirin for the PR.
Many screen-readers and programs like Mycroft, SEPIA, Home Assistant, etc. support the Mary-TTS HTTP API introduced many years ago to communicate with a TTS system. Coqui TTS would work out-of-the-box with all of these programs if the server could offer a compatible API. I've listed some endpoints here (old Javadocs here). The most important ones (all HTTP GET) are probably:
/locales
- returns a list of supported locales in the format[locale]\n...
/voices
- returns a list of supported voices in the format[name] [locale] [gender]\n...
. 'name' can be anything without spaces(!), 'locale' is e.g. en_US or de_DE (probably this) and 'gender' is female/male but could be extended with other/custom/diverse or whatever/process?INPUT_TEXT=..text..&INPUT_TYPE=TEXT&LOCALE=[locale]&VOICE=[name]&OUTPUT_TYPE=AUDIO&AUDIO=WAVE_FILE
- Processes the text and returns a wav file. We can probably ignoreINPUT_TYPE
,OUTPUT_TYPE
andAUDIO
as I've never seen any program using a different setting.There is an existing PR by @mhetzi that implemented a bit of the
/process
endpoint: https://github.com/coqui-ai/TTS/pull/2162 . We can build on that.One issue to solve:
The Coqui TTS server usually starts with a specific model pre-loaded and does not support hot-swapping of voices. So the questions are:
VOICE
parameter of the Mary-TTS/process
endpoint and set an individual voice for each request?/voices
, the active language for/locales
and ignore the parameter for/process
?Any thoughts about that? 🙂