revision of PR#84, too lazy to solve all conflict of rebase so simply inject the new part in neo branch.
This module provide speech-to-text from audio file sent by ST using Vosk or Whisper. Tried to not add too much to server.py, so put the stt module into their own files and used "add_url_rule" to add the api routes.
Features
STT providers
Vosk: open source STT with a library natively allowing streaming voice in real time
Whisper: open source STT, better accuracy than Vosk but need speech detection done beforehand. Currently using Vosk to cut voice for Whisper.
What changed
Server arguments
vosk-stt: activate Vosk module
whisper-stt: activate Whisper module
stt-vosk-model-path: path to vosk model, if not given it will be downloaded automatically and store in user cache folder.
stt-whisper-model-path: same but for whisper model. Default model are the smallest english ones about 100Mb.
New API routes
/api/speech-recognition/vosk/process-audio: Process the audio file sent in the request using Vosk, need to convert it to proper wav format using soundevice, only firefox send compatible file so far Chrome and Edge send uncompatible files.
/api/speech-recognition/whisper/process-audio: Process the audio file sent in the request using Whisper, no need for converting the file, whisper manage firefox/chrome/edge file directly.
Requirements
packages
sounddevice (to convert wav file into proper format for vosk using wave)
vosk (for Vosk STT)
openai-whisper (for Whisper STT)
ffmpeg (suposed to be needed by whisper, not sure if it install via pip or need external one, I do have both)
Tests
Tested only on Windows 11 / firefox 115 / chrome 115 / edge 115
the received audio file is stored in a file "stt_test.wav" that can be used to assess audio quality or just checking if recording works.
revision of PR#84, too lazy to solve all conflict of rebase so simply inject the new part in neo branch.
This module provide speech-to-text from audio file sent by ST using Vosk or Whisper. Tried to not add too much to server.py, so put the stt module into their own files and used "add_url_rule" to add the api routes.
Features
What changed
Tests
python server.py --enable-modules=vosk-stt
python server.py --enable-modules=whisper-stt
python server.py --enable-modules=vosk-stt --vosk-stt-model-path=modules\stt\vosk-model-en-us-0.22