alphacep / vosk-api

Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
Apache License 2.0
7.7k stars 1.08k forks source link

Wake up word etc #1310

Open project-owner opened 1 year ago

project-owner commented 1 year ago

Hi,

How is a wake up word/phrase supposed to work with Vosk? Is there any dedicated functionality in Vosk which handles that? I'm going to create a separate recognizer with fixed vocabulary with just a wake up word. When it will recognize a wake up word I will stop it and start another recognizer with another fixed vocabulary. Is it a way to go?

Is there a command template support in Vosk like in some other frameworks? For example I need to recognize the following command: 'play song money by pink floyd'. The template for English could be 'play song {SONG_NAME} by {BAND_NAME}'. For German 'song {SONG_NAME} von {BAND_NAME} spielen'.

Does Vosk support only words which it can find in dictionary? For example, I need to find songs by U2. In this case Vosk returns 'find you too' which is correct but there is no such band as 'you too'.

There are many singers/bands which are definitely not in the dictionary. Should I create a dedicated dictionary in this case and list all possible artists/groups? Is it possible to ask Vosk just output characters in case it cannot find a word in the dictionary?

Is it possible to mix languages? For example Google Translate can do that partially: find albums by death cab for cutie -> найти альбомы death cab для милашки Vosk returns this for the latter phrase from the Google translate: найти альбомы без капли милашки

Thank you!

omlins commented 1 year ago

@project-owner :

How is a wake up word/phrase supposed to work with Vosk? Is there any dedicated functionality in Vosk which handles that? I'm going to create a separate recognizer with fixed vocabulary with just a wake up word. When it will recognize a wake up word I will stop it and start another recognizer with another fixed vocabulary. Is it a way to go?

I believe this goes beyond the scope of Vosk. Such high level logic is generally implemented in applications or libraries that use Vosk. That said, as a general approach I believe it is very much the way to go and I have been using a similar approach in JustSayIt: https://github.com/omlins/JustSayIt.jl

For your other questions, I would also be curious about @nshmyrev 's answer.

nshmyrev commented 1 year ago

I'm going to create a separate recognizer with fixed vocabulary with just a wake up word. When it will recognize a wake up word I will stop it and start another recognizer with another fixed vocabulary. Is it a way to go?

Vosk works offline and doesn't require wake up word, you can simply recognize the command.

Is there a command template support in Vosk like in some other frameworks? For example I need to recognize the following command: 'play song money by pink floyd'. The template for English could be 'play song {SONG_NAME} by {BAND_NAME}'. For German 'song {SONG_NAME} von {BAND_NAME} spielen'.

You can do that with language model adaptation as described in https://alphacephei.com/vosk/lm

Does Vosk support only words which it can find in dictionary? For example, I need to find songs by U2. In this case Vosk returns 'find you too' which is correct but there is no such band as 'you too'.

Yes, you have to add those groups to the dictionary

There are many singers/bands which are definitely not in the dictionary. Should I create a dedicated dictionary in this case and list all possible artists/groups? Is it possible to ask Vosk just output characters in case it cannot find a word in the dictionary?

Yes. No.

Is it possible to mix languages? For example Google Translate can do that partially: find albums by death cab for cutie -> найти альбомы death cab для милашки Vosk returns this for the latter phrase from the Google translate: найти альбомы без капли милашки

Yes, you can introduce English words when you change the language model as linked above.

project-owner commented 1 year ago

Let's say I listen to radio. There are chances that control words like 'start', 'stop' etc will come from a radio rather than from me. The wake up word could help in this case. It could serve as a some kind of "switch".

Thank you for the info about 'Vosk Language Model Adaptation'.

Best regards