alphacep / vosk-server

WebSocket, gRPC and WebRTC speech recognition server based on Vosk and Kaldi libraries
Apache License 2.0
918 stars 248 forks source link

Python-Websocket: Sampling frequency mismatch, expected 16000, got 8000 #96

Closed tomanwalker closed 3 years ago

tomanwalker commented 3 years ago

Followed Docs and files from https://github.com/alphacep/vosk-server/tree/master/websocket

Tried first on RPi (2B), then on VM (Lubuntu x64) - same result - frequency mismatch

// ## Install
pip3 install --upgrade pip
pip3 install vosk
$ python3 --version
Python 3.8.5

// ## get model
wget http://alphacephei.com/vosk/models/vosk-model-small-en-us-0.15.zip
unzip vosk-model-small-en-us-0.15.zip
// ## edit server.py to have default model as "vosk-model-small-en-us-0.15" 

// ## Client
$ ./test.py test.wav

$ ./test.py test16k.wav

// ## Server
$ ./server.py 
LOG (VoskAPI:ReadDataFiles():model.cc:194) Decoding params beam=10 max-active=3000 lattice-beam=2
LOG (VoskAPI:ReadDataFiles():model.cc:197) Silence phones 1:2:3:4:5:6:7:8:9:10
LOG (VoskAPI:RemoveOrphanNodes():nnet-nnet.cc:948) Removed 0 orphan nodes.
LOG (VoskAPI:RemoveOrphanComponents():nnet-nnet.cc:847) Removing 0 orphan components.
LOG (VoskAPI:CompileLooped():nnet-compile-looped.cc:345) Spent 0.107284 seconds in looped compilation.
LOG (VoskAPI:ReadDataFiles():model.cc:221) Loading i-vector extractor from vosk-model-small-en-us-0.15/ivector/final.ie
LOG (VoskAPI:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG (VoskAPI:ComputeDerivedVars():ivector-extractor.cc:204) Done.
LOG (VoskAPI:ReadDataFiles():model.cc:251) Loading HCL and G from vosk-model-small-en-us-0.15/graph/HCLr.fst vosk-model-small-en-us-0.15/graph/Gr.fst
LOG (VoskAPI:ReadDataFiles():model.cc:273) Loading winfo vosk-model-small-en-us-0.15/graph/phones/word_boundary.int
Server starting, port = 2700
websocket.recv - start...
recognize - Init rec...
process_chunk - start...
ERROR (VoskAPI:MaybeCreateResampler():online-feature.cc:99) Sampling frequency mismatch, expected 16000, got 8000
Perhaps you want to use the options --allow_{upsample,downsample}
terminate called after throwing an instance of 'kaldi::KaldiFatalError'
  what():  kaldi::KaldiFatalError
Aborted (core dumped)
tomanwalker commented 3 years ago

After trying out different combinations - I saw that needed to do

./test_words.py
./test.py test.wav

But then "phrase_list" gets Emptied for every message, need to send "words" again, everytime

memarsh commented 3 years ago

I'm experiencing the same issue, and sending "words" again everytime is not a practical solution. The 'words.txt' file already exists in the model repo, so it should use that by default.

memarsh commented 3 years ago

Solution found: Send the config command like in ./test_words.py, but don't include "phrase_list" or "words", e.g.: await websocket.send('''{"config" : {"sample_rate" : 16000.0 }}''')

ArthurYoung1 commented 3 years ago

I've had this error using the Python API, not a websocket though. In my case, this problem was that the model was expecting a 16000hz audio file, and I gave it a 8000hz file

I think it may be possible to resample an 8000hz file to a 16000hz file - obviously you won't actually be gaining any audio data though, but it could fix this error

Alternatively, different models available at https://alphacephei.com/vosk/models accept audio with different sample rates. For example, the model titled "vosk-model-en-us-0.21" accepts audio with an 8000hz sampling frequency as well as some other frequencies (e.g. 44.1khz)