alphacep / vosk-server

WebSocket, gRPC and WebRTC speech recognition server based on Vosk and Kaldi libraries
Apache License 2.0
871 stars 241 forks source link

Vosk-server and vtt_client.py sample mismatch #235

Open echoTab opened 11 months ago

echoTab commented 11 months ago

I have vosk-server running on a VPS server under Ubuntu 22.04, cloned from https://github.com/alphacep/vosk-server. And I have vtt_client.py running on Windows 10 via WSL2/Ubuntu, cloned from https://github.com/MaxVRAM/vosk_vtt_client.git. Lots of problems getting pyaudio to work but finally got it to run after installing via conda (although vtt_client still thros many ASLA lib errors).

However when I start vosk-server and then vtt_client I get "sampling frequency mismatch, expected 16000, got 8000". I have tried hard coding vosk-server to 8k, and also tried hard coding vttclient to 16K. Neither of these changed the error message. Also tried running the server with --allow{upsample,downsample} but this did not help either.

Run out of ideas, are you able to help?

nshmyrev commented 11 months ago

What model are you running on the server?

To change everything to 16khz, you need to change both server:

https://github.com/alphacep/vosk-server/blob/master/websocket/asr_server.py#L95

and client

https://github.com/MaxVRAM/Vosk-VTT-Client/blob/main/vtt_client.py#L61

In general, we recommend sounddevice for microphone recording, we do not recommend pyaudio. We also recommend to use our examples instead of external projects.

echoTab commented 11 months ago

Thanks for your reply. I have been running with vosk-model-small-en-us-0.15, which I understand requires 16k sample rate. This may be a dumb question but looking at the code of asr_server.py I realise that maybe I have been confused between 'model' and 'spk_model'. Could you please explain how these differ?

nshmyrev commented 11 months ago

spk_model is for voice recognition (speaker identity).

echoTab commented 11 months ago

Thank you. It is working now.