Support Standalone Speech to Text Backends

Hello,

It seems that the project only support text-to-speech backends such as Parler and openedai-speech (offering piped and coqai tts). However there is no speech-to-text backends (I think the only one to exist is ones offering Whisper). Some frontends like Open WebUI have Whisper integrated, but other like LibreChat do not. I think it would add to the flexibility of the project if a Whisper server can be a standalone backend

I wonder if you can add support to stt modules and maybe also allow choosing between the different model sizes.

After some digging, I found that similar to Parler TTS, this project https://github.com/fedirz/faster-whisper-server also offers an openAI compatible server with a docker image, so it shouldn't be hard to add. While it is not the same original whisper model, it is the same one used by Open WebUI according to https://github.com/open-webui/open-webui/blob/b64c9d966a6498d4f2ffe12ab1498fab298afb37/Dockerfile#L129 .

Best regards, MY

av / harbor

Support Standalone Speech to Text Backends #22