av / harbor

Effortlessly run LLM backends, APIs, frontends, and services with one command.
https://github.com/av/harbor
Apache License 2.0
202 stars 12 forks source link

Support Standalone Speech to Text Backends #22

Open maeyounes opened 4 days ago

maeyounes commented 4 days ago

Hello,

It seems that the project only support text-to-speech backends such as Parler and openedai-speech (offering piped and coqai tts). However there is no speech-to-text backends (I think the only one to exist is ones offering Whisper). Some frontends like Open WebUI have Whisper integrated, but other like LibreChat do not. I think it would add to the flexibility of the project if a Whisper server can be a standalone backend

I wonder if you can add support to stt modules and maybe also allow choosing between the different model sizes.

After some digging, I found that similar to Parler TTS, this project https://github.com/fedirz/faster-whisper-server also offers an openAI compatible server with a docker image, so it shouldn't be hard to add. While it is not the same original whisper model, it is the same one used by Open WebUI according to https://github.com/open-webui/open-webui/blob/b64c9d966a6498d4f2ffe12ab1498fab298afb37/Dockerfile#L129 .

Best regards, MY

av commented 4 days ago

Yes, I agree that a STT is a missing piece. It makes perfect sense to have that available as a service. I also hope that openedai-speech (or another abstract project) would allow for a more encompassing setup later.