Open YangangCao opened 3 weeks ago
You can simply use more generic language model, it is easier to design than custom grammar and more robust to mistakes. You can even use LLM biased with prompt. It all ends in the quality of the acoustic model anyway. Modern AM can also help a lot.
If I use more generic language model, I think the words out of reference text will appear and cause more mismatch (I am not sure), actually I only want to recognize words in reference text, so I want to modify on grammar ASR.
It is always about bias weight. If you bias LM with appropriate weight you can find an optimal point where most of the words are recognized but still it is possible to recognize unrelated speech.
ok thanks, by the way, can I run grammar ASR on GPU? I can't see SetGrm function on batch_recognizer.cc.
You can not
Hi, dear author
The grammar in vosk is very useful, but there are also some problems for me, for example:
Can vosk or language model support advance function to solve these problems?
Actually I want to do English pronunciation assessmention, I use reference text to generate grammar, and use grammar to detect excess words and missed words in audio, and then use output text of vosk to align audio, so I finally the phone-audio alignments. This behave cost lots of computer resource, Can we design an alignment network that allows for missed and multiple words to align directly?
Thanks for your kind reply~