Open SumeetGohil opened 4 years ago
Not specific to the android
Приветствую. Есть подвижки по данному вопросу?
It would be nice to implement https://github.com/kaldi-asr/kaldi/blob/master/src/online2bin/online2-wav-nnet3-wake-word-decoder-faster.cc
I wish Wake Word Detection function be implemented in other programming languages (python / javascript).
As a quick fix one might init recognizer like this:
rec = KaldiRecognizer(model, wf.getframerate(), '[ "keyphrase", "[unk]" ]')
and it will either detect keyphrase or unk keyword.
Thank you for the quickfix idea. Unfortunately this does not seem to work in my case.
If the keyword is potato
and the speaker says there is a potato on the table
the recognizer will detect potato
, but only after the speaker finishes the whole sentence. It would be great if the recognizer could stop and raise a flag as soon as potato has been pronounced.
Edit:now I know [unk] means noise to be filtered out
You can try Snowboy for KWS task. The project is not maintained anymore, but there are several good models trained (e.g. for Alexa and Snowboy wake words) and plenty of examples available in different programming languages. I'm using it right now together with Vosk. So far so good.
Is it possible to limit the length of a recognized sentence? For example 10 words max, or 10 seconds?
@sskorol Thanks, Sowboy works great. But the universual wake words are limited, and trained custom wake words is only person specific, and it's closed down.
@sskorol Thanks, Sowboy works great. But the universual wake words are limited, and trained custom wake words is only person specific, and it's closed down.
What do you mean by limited? Anyone can use them w/o restrictions. And in terms of custom wake words, do you really think that the training process would be drastically different? You'd still need a lot of samples recorded by different people (age, gender, nationality, accent, etc) to make a generic and robust model. So the problem is not in the tool. There's a number of solutions which gives you an opportunity to train your own wake word. But the main problem is still in data. No data - no generic wake word. And for exotic wake words you won't be able to generate enough data for training by your own. That's why there was made an attempt (by Snowboy devs) to collect data via public crowdsourcing service. But it failed, as most of the people are lazy, and don't wanna spend their time by recording wake words for someone else.
The other important thing is required resources for the actual KWS engine. Ideally, it should be fully independent from ASR engine and bundled into mic array firmware to avoid continuous data streaming through network to ASR engine and false positive triggers. Moreover, you can't run Vosk on e.g. esp32 (Matrix Voice) or Respeaker Core. And with Snowboy you can. That's why I don't believe it's reasonable trying to solve this task with heavy ASR engine. It should be an independent lightweight and cross-platform API.
Is it possible to limit the length of a recognized sentence? For example 10 words max, or 10 seconds?
You can limit audio fed to vosk to 10 seconds. Or stop recognition when rec.partialResult() returns more than 10 words
Is it possible to limit the length of a recognized sentence? For example 10 words max, or 10 seconds?
You can limit audio fed to vosk to 10 seconds. Or stop recognition when rec.partialResult() returns more than 10 words
That's a great idea to know when to stop. Thank you.
Is it possible to limit the length of a recognized sentence? For example 10 words max, or 10 seconds?
You can limit audio fed to vosk to 10 seconds. Or stop recognition when rec.partialResult() returns more than 10 words
That's a great idea to know when to stop. Thank you.
Salut Stan,
Did you try this solution with acceptable results ? I'm also looking for managing KWS with Vosk.
Is it possible to limit the length of a recognized sentence? For example 10 words max, or 10 seconds?
You can limit audio fed to vosk to 10 seconds. Or stop recognition when rec.partialResult() returns more than 10 words
That's a great idea to know when to stop. Thank you.
Salut Stan,
Did you try this solution with acceptable results ? I'm also looking for managing KWS with Vosk.
Hello pga-avionics, Yes, I tried limiting the number of words in partial results and parse the final result for my keyword and the results are quite satisfactory. There are still some failures and slowness in recognition, but overall it is a good workaround while looking forward to the real implementation of KWS.
If the keyword is
potato
and the speaker saysthere is a potato on the table
the recognizer will detectpotato
, but only after the speaker finishes the whole sentence.
that seems to me untrue. Using Vosk you can get result word by word and so trigger your action afterward.
It would be great if the recognizer could stop and raise a flag as soon as potato has been pronounced.
That's currently feasible using vosk-api!
If I give only few words to train and test, will that detects those selective words?
please share Keyword Search example with same
aar
lib ?