daswer123 / xtts-api-server

A simple FastAPI Server to run XTTSv2
MIT License
372 stars 85 forks source link

Feature Request: MaryTTS Compatibility #57

Open ther3zz opened 7 months ago

ther3zz commented 7 months ago

Hello,

Would it be possible to write MarryTTS compatibility into this (similar to what coqui-tts has)?

The specific intent here is to provide compatibility with Home Assistant

neowisard commented 3 months ago

@daswer123 Could you please estimate the complexity of forming this endpoint in your product ? and specify where to do it, I can try to do it with DeepSeek coder in PyCharm.

about last endpoint in this API , only this need to me and community of HomeAssistant.It is a smart home product and it has a handy assistant. It requires STT, LLM, function, TTS. And preferably with an api compatible with OpenAI. STT - whisper.cpp LLM llama.cpp TTS ??? (alltalk_tts has big overhead, localai buggy, silero deprecated).

/process?INPUT_TEXT=..text..&INPUT_TYPE=TEXT&LOCALE=[locale]&VOICE=[name]&OUTPUT_TYPE=AUDIO&AUDIO=WAVE_FILE - Processes the text and returns a wav file. We can probably ignore INPUT_TYPE, OUTPUT_TYPE and AUDIO as I've never seen any program using a different setting.

ther3zz commented 3 months ago

@daswer123 Could you please estimate the complexity of forming this endpoint in your product ? and specify where to do it, I can try to do it with DeepSeek coder in PyCharm.

about last endpoint in this API , only this need to me and community of HomeAssistant.It is a smart home product and it has a handy assistant. It requires STT, LLM, function, TTS. And preferably with an api compatible with OpenAI. STT - whisper.cpp LLM llama.cpp TTS ??? (alltalk_tts has big overhead, localai buggy, silero deprecated).

/process?INPUT_TEXT=..text..&INPUT_TYPE=TEXT&LOCALE=[locale]&VOICE=[name]&OUTPUT_TYPE=AUDIO&AUDIO=WAVE_FILE - Processes the text and returns a wav file. We can probably ignore INPUT_TYPE, OUTPUT_TYPE and AUDIO as I've never seen any program using a different setting.

@neowisard So there's a different project (openedai-speech) which works well with this home assistant HACS integration openai_tts fork.

If you want to be able to type your own model/voice values, take a look at this openai_tts PR

neowisard commented 3 months ago

Thx ! Just to be clear, I have tested both api's and despite the fact that they use almost identical engines and models. xtts-api-server with deepseed (12-13sec) on my Tesla P40 is slightly faster than openedai-speech with enabled deepspeed (18-21 sec).
Screenshot from 2024-07-05 01-27-11 Screenshot from 2024-07-05 01-34-57

and both has some memory leak (on vgpu) . I just forked and tweaked it for me. it is in my repo.