alphacep / vosk-api

Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
Apache License 2.0
8.23k stars 1.13k forks source link

Prevent recognizer from recognizing long sentences #360

Open stanislas-brossette opened 3 years ago

stanislas-brossette commented 3 years ago

Hello, Is it possible to force stop the recognizer from listening to very long sentences. I'm trying to recognize orders while the TV is on and the problem is that the recognizer seem to keep listening until there is a silence, which does not occur often on TV, so it ends up recognizing very long sentences, among which my orders are hidden. The result is that the order can only be treated long after it was given. So is there a way to force the recognizer to stop after a given duration or a given number of words. Best, Stanislas

nshmyrev commented 3 years ago

You probably need a separate module to split commands from tv (beamforming with multiple microphones probably) or just aec.

sskorol commented 3 years ago

Technically, you can "mute" Vosk input after some timeout and "unmute" it then by some other condition. But it'd be just an ugly workaround.

As Nikolay mentioned the only valid approach is to have a microphone array, which allows you to split audio sources between beams. And then apply beamforming algorithms, which will help to focus on a single direction.

On the other hand, you won't be able to choose which direction is valid w/o additional techniques like wake word detection. Generally speaking, you need to focus on a beam that detected a wake word and ignore other sources. Moreover, I don't believe you can do that with a single microphone, as it will be a mess of audio chunks coming from different directions.