Open echoTab opened 1 year ago
What model are you running on the server?
To change everything to 16khz, you need to change both server:
https://github.com/alphacep/vosk-server/blob/master/websocket/asr_server.py#L95
and client
https://github.com/MaxVRAM/Vosk-VTT-Client/blob/main/vtt_client.py#L61
In general, we recommend sounddevice for microphone recording, we do not recommend pyaudio. We also recommend to use our examples instead of external projects.
Thanks for your reply. I have been running with vosk-model-small-en-us-0.15, which I understand requires 16k sample rate. This may be a dumb question but looking at the code of asr_server.py I realise that maybe I have been confused between 'model' and 'spk_model'. Could you please explain how these differ?
spk_model is for voice recognition (speaker identity).
Thank you. It is working now.
I have vosk-server running on a VPS server under Ubuntu 22.04, cloned from https://github.com/alphacep/vosk-server. And I have vtt_client.py running on Windows 10 via WSL2/Ubuntu, cloned from https://github.com/MaxVRAM/vosk_vtt_client.git. Lots of problems getting pyaudio to work but finally got it to run after installing via conda (although vtt_client still thros many ASLA lib errors).
However when I start vosk-server and then vtt_client I get "sampling frequency mismatch, expected 16000, got 8000". I have tried hard coding vosk-server to 8k, and also tried hard coding vttclient to 16K. Neither of these changed the error message. Also tried running the server with --allow{upsample,downsample} but this did not help either.
Run out of ideas, are you able to help?