This module provide pseudo-streaming speech-to-text using Vosk and Whisper. Tried to not add too much to server.py, so put the stt module into their own files and used "add_url_rule" to add the api routes.
Features
STT providers
Vosk: open source STT with a library natively allowing streaming voice in real time
Whisper: open source STT, better accuracy than Vosk but need speech detection done beforehand. Currently using Vosk to cut voice for Whisper.
What changed
Server arguments
vosk-stt: activate Vosk module
whisper-stt: activate Whisper module
-stt-microphone-id: set the input device for sounddevice library, if not set default mic will be selected and the list of device will be print.
stt-vosk-model-path: path to vosk model, if not given it will be downloaded automatically and store in user cache folder.
stt-whisper-model-path: same but for whisper model. Default model are the smallest english ones about 100Mb.
New API routes
/api/stt/vosk/record: start a recording of user microphone using sounddevice library, raw audio block of fix size are stored in a queue in a parallel callback thread. Vosk process the queue block per block until it detect end of speech. The finished transcript is return as a string.
/api/stt/vosk/record: for now kinda trivial, just use vosk as previously for speech capture then save complete audio to file that is then processed by whisper, return the transcript of whisper, just print the one of whisper for debug info.
Requirements
packages
sounddevice (microphone audio capture)
vosk (for Vosk STT)
openai-whisper (for Whisper STT)
ffmpeg (suposed to be needed by whisper, not sure if it install via pip or need external one, I do have both)
Tests
Tested only on Windows 11
the audio recording of last message is stored in a file "stt_test.wav" that can be used to assess audio quality or just checking if recording works.
Unpluging or pluging additional device should just raise a stream error that is captured during the audio processing
This module provide pseudo-streaming speech-to-text using Vosk and Whisper. Tried to not add too much to server.py, so put the stt module into their own files and used "add_url_rule" to add the api routes.
Features
What changed
Tests
python server.py --enable-modules=vosk-stt
python server.py --enable-modules=whisper-stt
python server.py --enable-modules=whisper-stt --stt-microphone-id=1
python server.py --enable-modules=vosk-stt --vosk-stt-model-path=modules\stt\vosk-model-en-us-0.22 --stt-microphone-id 0