It seems that the project only support text-to-speech backends such as Parler and openedai-speech (offering piped and coqai tts). However there is no speech-to-text backends (I think the only one to exist is ones offering Whisper). Some frontends like Open WebUI have Whisper integrated, but other like LibreChat do not. I think it would add to the flexibility of the project if a Whisper server can be a standalone backend
I wonder if you can add support to stt modules and maybe also allow choosing between the different model sizes.
Yes, I agree that a STT is a missing piece. It makes perfect sense to have that available as a service. I also hope that openedai-speech (or another abstract project) would allow for a more encompassing setup later.
Hello,
It seems that the project only support text-to-speech backends such as Parler and openedai-speech (offering piped and coqai tts). However there is no speech-to-text backends (I think the only one to exist is ones offering Whisper). Some frontends like Open WebUI have Whisper integrated, but other like LibreChat do not. I think it would add to the flexibility of the project if a Whisper server can be a standalone backend
I wonder if you can add support to stt modules and maybe also allow choosing between the different model sizes.
After some digging, I found that similar to Parler TTS, this project https://github.com/fedirz/faster-whisper-server also offers an openAI compatible server with a docker image, so it shouldn't be hard to add. While it is not the same original whisper model, it is the same one used by Open WebUI according to https://github.com/open-webui/open-webui/blob/b64c9d966a6498d4f2ffe12ab1498fab298afb37/Dockerfile#L129 .
Best regards, MY