Closed seepine closed 10 months ago
Probably audio format is wrong, it must be 16khz 16bit mono strictly. If format is wrong you get garbage.
For Chinese FunASR should have best accuracy https://github.com/alibaba-damo-academy/FunASR/tree/main
I converted using ffmpeg, maybe 16khz 16bit mono strictly
ffmpeg -i "input.m4a" -ac 1 -ar 16000 -acodec pcm_s16le test-ja.wav
@nshmyrev Hi ,It is the model problem? It can accurately recognize what I'm saying on https://alphacephei.com/cn/
demo.
But i download 1.3GB model in https://alphacephei.com/vosk/models/vosk-model-cn-0.22.zip
, it can not be work well.
You can try 8khz also, maybe your vosk-server configured to 8khz
You can try 8khz also, maybe your vosk-server configured to 8khz
yes, must 8k..
@seepine You'd better reconfigure both to 16khz, it will be better accuracy. VOSK_SAMPLE_RATE environment variable for the server.
I set up a
vosk-server
using Docker and downloaded the 1GB+ models for both Chinese (cn) and Japanese (ja). However, after recognizing various audio files including online audios and my own recordings, the recognized text is completely unrelated to the actual content.with node websocket client demo