Grammar VS Alignment Network

YangangCao commented 3 weeks ago

Hi, dear author

The grammar in vosk is very useful, but there are also some problems for me, for example:

Replacement error in similar pronunciation，like "two" and "to" in one sentence
There are some sentences in one chapter, if I set a grammar over chapter, the words in first sentence can also be recognize in the last sentence

Can vosk or language model support advance function to solve these problems?

Actually I want to do English pronunciation assessmention, I use reference text to generate grammar, and use grammar to detect excess words and missed words in audio, and then use output text of vosk to align audio, so I finally the phone-audio alignments. This behave cost lots of computer resource, Can we design an alignment network that allows for missed and multiple words to align directly?

Thanks for your kind reply~

nshmyrev commented 3 weeks ago

You can simply use more generic language model, it is easier to design than custom grammar and more robust to mistakes. You can even use LLM biased with prompt. It all ends in the quality of the acoustic model anyway. Modern AM can also help a lot.

YangangCao commented 2 weeks ago

If I use more generic language model, I think the words out of reference text will appear and cause more mismatch (I am not sure), actually I only want to recognize words in reference text, so I want to modify on grammar ASR.

nshmyrev commented 2 weeks ago

It is always about bias weight. If you bias LM with appropriate weight you can find an optimal point where most of the words are recognized but still it is possible to recognize unrelated speech.

YangangCao commented 2 weeks ago

ok thanks, by the way, can I run grammar ASR on GPU? I can't see SetGrm function on batch_recognizer.cc.

nshmyrev commented 3 days ago

You can not

alphacep / vosk-api

Grammar VS Alignment Network #1588