alphacep / vosk-api

Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
Apache License 2.0
7.38k stars 1.04k forks source link

How many new words and phrases could be added in real time without effecting the whole recognition accuracy. #1453

Open kerolos opened 8 months ago

kerolos commented 8 months ago

Issue Title:

Questions about Adding Phrases in the Dynamic Graph Update

Issue Description:

In this example, we demonstrate how to add phrases in the dynamic graph update using the following code snippet from the Vosk API:

Python

rec.SetGrammar('["one zero one two three oh", "four five six", "seven eight nine zero", "[unk]"]') or: rec = KaldiRecognizer(model, wf.getframerate(),'["one zero one two three oh", "four five six", "seven eight nine zero", "[unk]"]')

C++

void Recognizer::UpdateGrammarFst(char const *grammar)

My first question is: Do all those words ("one zero one two three oh," "four five six," "seven eight nine zero," "[unk]") need to be present in the lexicon HCL.fst (words.txt)? Should we create a phone sequence to enable the recognizer to handle these unknown words?

Is it possible to feed this list of sequence sentences or words from a file rather than hard-coding them as a list?

My second question is: How many phrases can be added in real-time without significantly affecting the overall recognition accuracy (optimal number ~)?

nshmyrev commented 8 months ago

Do all those words ("one zero one two three oh," "four five six," "seven eight nine zero," "[unk]") need to be present in the lexicon HCL.fst (words.txt)?

Yes

Should we create a phone sequence to enable the recognizer to handle these unknown words?

only as a separate compilation step

Is it possible to feed this list of sequence sentences or words from a file rather than hard-coding them as a list?

No, you have to load them in json object

How many phrases can be added in real-time without significantly affecting the overall recognition accuracy (optimal number ~)?

Usually several hundreds. For bigger vocabulary update it is better to recompile LM as in https://alphacephei.com/vosk/lm

kerolos commented 8 months ago

Thanks for your quick response :) Another questions:

nshmyrev commented 8 months ago

Is there any way to add new words on the fly during recognition (instead of create a new graph) ?

No

if not, Is there any way to implement that ?

No easy way in Vosk unfortunately. Otherwise there are publications like https://arxiv.org/abs/2003.09024