Closed EsakkiSundar closed 3 years ago
Vosk internally uses Kaldi. You can take a look at sources to find out how: https://github.com/alphacep/vosk-api/blob/master/src/kaldi_recognizer.cc
As far as I understand the major VAD is Kaldi here. where as with deepspeech, it is from google WebRTCVAD
Thanks a lot for your quick responses.
One more question. Just curious to know the major reasons/USP why vosk was created. What benefits is vosk planning to provide which kaldi fails to deliver.
One more question. Just curious to know the major reasons/USP why vosk was created. What benefits is vosk planning to provide which kaldi fails to deliver.
The answer is on the front page, kaldi provides neither of these:
Vosk is an offline open source speech recognition toolkit. It enables speech recognition models for 17 languages and dialects - English, Indian English, German, French, Spanish, Portuguese, Chinese, Russian, Turkish, Vietnamese, Italian, Dutch, Catalan, Arabic, Greek, Farsi, Filipino.
Vosk models are small (50 Mb) but provide continuous large vocabulary transcription, zero-latency response with streaming API, reconfigurable vocabulary and speaker identification.
Speech recognition bindings implemented for various programming languages like Python, Java, Node.JS, C#, C++ and others.
Vosk supplies speech recognition for chatbots, smart home appliances, virtual assistants. It can also create subtitles for movies, transcription for lectures and interviews.
Vosk scales from small devices like Raspberry Pi or Android smartphone to big clusters.
I would like to understand what is the the difference between VOSK and Kaldi. When should we use Kaldi over Vosk and vice-versa. If someone could share their thoughts or point me to an article it will be of great help?