alphacep / vosk-api

Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
Apache License 2.0
7.35k stars 1.04k forks source link

Question for Implementation for Short Commands #1569

Open MarioIb14 opened 1 month ago

MarioIb14 commented 1 month ago

Hello, I am working on project the user requires user to say short commands such as "in" and "out". Is there way to modify the code for single worded commands so command do not chain together? Also is there a way to reduce the time that the function waits for silence?

nshmyrev commented 1 month ago

What kind of project exactly

MarioIb14 commented 1 month ago

Sorry I am looking to use python version. I meant the project is trying convert voice to text for short commands and text will be used interact GUI program with single word commands. I started to use setgrammer and EndpointerMode(cannot import from vosk it for some reason).

nshmyrev commented 1 month ago

I'm asking what kind of short commands your software is going to recognize

MarioIb14 commented 1 month ago

So the ones I am planning to use is '["in", "out", "left","right","up", "down", "pause", "stop","next","start", "[unk]"]'

MarioIb14 commented 1 month ago

Sorry do you need any more clarification on the specific short commands?

nshmyrev commented 1 month ago

Yes, I need to understand the application you are creating

MarioIb14 commented 1 month ago

So we have a Matlab script that runs in the background and has a GUI with a text input box that our software needs to write voice commands in each iteration. So the script checks the text input each iteration to control the movement of a robotic arm with commands seen above(if no new command it will take the previous command for movement). So we need a program that can be running in the background checking if a specific command (single-worded) is said to write in the input box and press enter, I am trying to make fast as possible by limiting amount words (setting grammar), preventing the program from stringing words into sentences (if possible) and reduce time from partial to final output (I notice when the test microphone example code that it would detect word command so quick from partial output but it would wait some time output the result).