alphacep / vosk-api

Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
Apache License 2.0
7.35k stars 1.04k forks source link

SetGrammar API #1584

Open donaldos opened 3 weeks ago

donaldos commented 3 weeks ago

Dear Owner.

We are using the vosk api and websockets package to implement a multi-access service for our self-generated model. In particular, we are developing an engine that evaluates how well a student reads a given sentence among several learning activities as an educational English speaking learning system. For each content text, we are creating an integrated model, HCLG.fst, by creating a language model, pronunciation dictionary, and sound model. This seems to be too inefficient for the content.

If we have HCLr.fst and Gr.fst provided by the initial vosk api, we can use the vosk module's setGrammar to generate the appropriate recognition domain for our domain.

Is it possible to generate HCLr.fst and Gr.fst with final.mdl and LM with text and Lexicon with final.mdl generated by Kaldi?

Thank you for your valuable guidance.

Translated with DeepL.com (free version)

nshmyrev commented 3 weeks ago

Sure, you can use this script:

https://github.com/kaldi-asr/kaldi/blob/master/egs/wsj/s5/utils/mkgraph_lookahead.sh

donaldos commented 3 weeks ago

Thank you. I will try,.....

donaldos commented 3 weeks ago

I applied the shell program you provided to run.sh and proceeded. When applying mkgraph.sh, it was processed well, but when applying mkgraph_lookahead.sh, the following error occurred. I'm trying to find the cause of it.

Thanks for your help.

fstdeterminizestar lang/L_disambig.fst fstcomposecontext --context-size=3 --central-position=1 --read-disambig-syms=lang/phones/disambig.int --write-disambig-syms=graph/disambig_ilabels_3_1.int graph/ilabels_3_1.44271 graph/L_disambig_det.fst make-h-transducer --disambig-syms-out=graph/disambig_tid.int --transition-scale=1.0 graph/ilabels_3_1 am/tree am/final.mdl fstdeterminizestar add-self-loops --disambig-syms=graph/disambig_tid.int --self-loop-scale=0.1 --reorder=true am/final.mdl ERROR: FstHeader::Read: Bad FST header: standard input ERROR (add-self-loops[5.5.1137~2-51744]:main():add-self-loops.cc:98) add-self-loops: error reading input FST.

[ Stack-Trace: ] /mnt/prj/workspace.speech/kaldi.testbed/kaldi/src/lib/libkaldi-base.so(kaldi::MessageLogger::LogMessage() const+0x793) [0x7f49a2ee41c3] add-self-loops(kaldi::MessageLogger::LogAndThrow::operator=(kaldi::MessageLogger const&)+0x25) [0x559af299b16d] add-self-loops(main+0x6fe) [0x559af299a307] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x7f49a2958083] add-self-loops(_start+0x2e) [0x559af2999b4e]

kaldi::KaldiFatalErrorERROR: FstHeader::Read: Bad FST header: standard input ERROR: FstHeader::Read: Bad FST header: standard input

nshmyrev commented 3 weeks ago

You need to read whole output because real issue is somewhere above.

donaldos commented 3 weeks ago

I've looked into many of the above issues, but I'm not finding any particular clues. In the shell, which defines the overall flow, I used the following code to generate the final HCLG and it worked fine. utils/mkgraph.sh --remove-oov $lang_dir $am_dir $graph_dir

However, I also added the following code to generate HCLr and Gr, or replaced the above code, and the same error occurred. utils/mkgraph_lookahead.sh --remove-oov $lang_dir $am_dir $graph_dir

Any further advice? Thanks in advance.

nshmyrev commented 3 weeks ago

Try the variant with arpa lm instead of $lang_dir as in

https://github.com/alphacep/vosk-api/blob/master/python/example/colab/vosk-adaptation.ipynb

also make sure you have opengrm installed

donaldos commented 2 weeks ago

Thank you for all the help we have received from you.

The purpose of our speech recognition service was to build a speech recognition engine for children to learn to read along as they learn to speak English.

Therefore, it was very necessary to recognize the text only for the given text and to recognize the mispronunciation in case of mispronunciation. In particular, we created HCLr.fst and Gr.fst and configured the recognition engine suitable for this task with SetGrammar and Unknown.

The acoustic model will be used to perform this task by building its own database of Korean children's English speech.

In particular, it would be helpful for phonics learning if phoneme-by-phoneme evaluation, or scoring, was supported, and I would like to know if your company has any plans to support this feature.

Thank you very much for your help.

nshmyrev commented 2 weeks ago

In particular, it would be helpful for phonics learning if phoneme-by-phoneme evaluation, or scoring, was supported, and I would like to know if your company has any plans to support this feature.

We do not have any plans for that