alphacep / vosk-api

Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
Apache License 2.0
7.7k stars 1.08k forks source link

Keyword spotting / Keyword Search / Keyword Detection / Keyphrase Detection Demo #107

Open SumeetGohil opened 4 years ago

SumeetGohil commented 4 years ago

please share Keyword Search example with same aar lib ?

nshmyrev commented 4 years ago

Not specific to the android

KhArtNJava commented 4 years ago

Приветствую. Есть подвижки по данному вопросу?

nshmyrev commented 4 years ago

It would be nice to implement https://github.com/kaldi-asr/kaldi/blob/master/src/online2bin/online2-wav-nnet3-wake-word-decoder-faster.cc

hyansuper commented 4 years ago

I wish Wake Word Detection function be implemented in other programming languages (python / javascript).

nshmyrev commented 3 years ago

As a quick fix one might init recognizer like this:

  rec = KaldiRecognizer(model, wf.getframerate(), '[ "keyphrase", "[unk]" ]')

and it will either detect keyphrase or unk keyword.

stanislas-brossette commented 3 years ago

Thank you for the quickfix idea. Unfortunately this does not seem to work in my case. If the keyword is potato and the speaker says there is a potato on the table the recognizer will detect potato, but only after the speaker finishes the whole sentence. It would be great if the recognizer could stop and raise a flag as soon as potato has been pronounced.

hyansuper commented 3 years ago
  1. what does [unk] mean?
  2. how to detect more than one keyphrase? say, I want to detect "Hello" OR "Hi", how do I put it ? Thank you

Edit:now I know [unk] means noise to be filtered out

sskorol commented 3 years ago

You can try Snowboy for KWS task. The project is not maintained anymore, but there are several good models trained (e.g. for Alexa and Snowboy wake words) and plenty of examples available in different programming languages. I'm using it right now together with Vosk. So far so good.

stanislas-brossette commented 3 years ago

Is it possible to limit the length of a recognized sentence? For example 10 words max, or 10 seconds?

hyansuper commented 3 years ago

@sskorol Thanks, Sowboy works great. But the universual wake words are limited, and trained custom wake words is only person specific, and it's closed down.

sskorol commented 3 years ago

@sskorol Thanks, Sowboy works great. But the universual wake words are limited, and trained custom wake words is only person specific, and it's closed down.

What do you mean by limited? Anyone can use them w/o restrictions. And in terms of custom wake words, do you really think that the training process would be drastically different? You'd still need a lot of samples recorded by different people (age, gender, nationality, accent, etc) to make a generic and robust model. So the problem is not in the tool. There's a number of solutions which gives you an opportunity to train your own wake word. But the main problem is still in data. No data - no generic wake word. And for exotic wake words you won't be able to generate enough data for training by your own. That's why there was made an attempt (by Snowboy devs) to collect data via public crowdsourcing service. But it failed, as most of the people are lazy, and don't wanna spend their time by recording wake words for someone else.

The other important thing is required resources for the actual KWS engine. Ideally, it should be fully independent from ASR engine and bundled into mic array firmware to avoid continuous data streaming through network to ASR engine and false positive triggers. Moreover, you can't run Vosk on e.g. esp32 (Matrix Voice) or Respeaker Core. And with Snowboy you can. That's why I don't believe it's reasonable trying to solve this task with heavy ASR engine. It should be an independent lightweight and cross-platform API.

hyansuper commented 3 years ago

Is it possible to limit the length of a recognized sentence? For example 10 words max, or 10 seconds?

You can limit audio fed to vosk to 10 seconds. Or stop recognition when rec.partialResult() returns more than 10 words

stanislas-brossette commented 3 years ago

Is it possible to limit the length of a recognized sentence? For example 10 words max, or 10 seconds?

You can limit audio fed to vosk to 10 seconds. Or stop recognition when rec.partialResult() returns more than 10 words

That's a great idea to know when to stop. Thank you.

pga-avionics commented 3 years ago

Is it possible to limit the length of a recognized sentence? For example 10 words max, or 10 seconds?

You can limit audio fed to vosk to 10 seconds. Or stop recognition when rec.partialResult() returns more than 10 words

That's a great idea to know when to stop. Thank you.

Salut Stan,

Did you try this solution with acceptable results ? I'm also looking for managing KWS with Vosk.

stanislas-brossette commented 3 years ago

Is it possible to limit the length of a recognized sentence? For example 10 words max, or 10 seconds?

You can limit audio fed to vosk to 10 seconds. Or stop recognition when rec.partialResult() returns more than 10 words

That's a great idea to know when to stop. Thank you.

Salut Stan,

Did you try this solution with acceptable results ? I'm also looking for managing KWS with Vosk.

Hello pga-avionics, Yes, I tried limiting the number of words in partial results and parse the final result for my keyword and the results are quite satisfactory. There are still some failures and slowness in recognition, but overall it is a good workaround while looking forward to the real implementation of KWS.

solyarisoftware commented 2 years ago

If the keyword is potato and the speaker says there is a potato on the table the recognizer will detect potato, but only after the speaker finishes the whole sentence.

that seems to me untrue. Using Vosk you can get result word by word and so trigger your action afterward.

It would be great if the recognizer could stop and raise a flag as soon as potato has been pronounced.

That's currently feasible using vosk-api!

Reethuch commented 1 year ago

If I give only few words to train and test, will that detects those selective words?