alphacep / vosk-server

WebSocket, gRPC and WebRTC speech recognition server based on Vosk and Kaldi libraries
Apache License 2.0
925 stars 249 forks source link

Vosk-server and vtt_client.py sample mismatch #235

Open echoTab opened 1 year ago

echoTab commented 1 year ago

I have vosk-server running on a VPS server under Ubuntu 22.04, cloned from https://github.com/alphacep/vosk-server. And I have vtt_client.py running on Windows 10 via WSL2/Ubuntu, cloned from https://github.com/MaxVRAM/vosk_vtt_client.git. Lots of problems getting pyaudio to work but finally got it to run after installing via conda (although vtt_client still thros many ASLA lib errors).

However when I start vosk-server and then vtt_client I get "sampling frequency mismatch, expected 16000, got 8000". I have tried hard coding vosk-server to 8k, and also tried hard coding vttclient to 16K. Neither of these changed the error message. Also tried running the server with --allow{upsample,downsample} but this did not help either.

Run out of ideas, are you able to help?

nshmyrev commented 1 year ago

What model are you running on the server?

To change everything to 16khz, you need to change both server:

https://github.com/alphacep/vosk-server/blob/master/websocket/asr_server.py#L95

and client

https://github.com/MaxVRAM/Vosk-VTT-Client/blob/main/vtt_client.py#L61

In general, we recommend sounddevice for microphone recording, we do not recommend pyaudio. We also recommend to use our examples instead of external projects.

echoTab commented 1 year ago

Thanks for your reply. I have been running with vosk-model-small-en-us-0.15, which I understand requires 16k sample rate. This may be a dumb question but looking at the code of asr_server.py I realise that maybe I have been confused between 'model' and 'spk_model'. Could you please explain how these differ?

nshmyrev commented 1 year ago

spk_model is for voice recognition (speaker identity).

echoTab commented 1 year ago

Thank you. It is working now.