Open nshmyrev opened 3 years ago
Hi,
I'm trying to include OOVs to the recognizer vocabulary in real time as seen here - https://github.com/alphacep/vosk-api/blob/master/python/example/test_words.py and add more OOVs like so -
rec = KaldiRecognizer(model, wf.getframerate(), '["paleoanthropology", "[unk]"]')
I'm primarily adding technical words into the recognizer but WER isn't changing. I'm a noob at this and trying to get better results compared to the pre-trained model.
Any help would be really appreciated, thank you!
@shashankmc your question is not related to this issue
Sorry, will raise another issue!
This is a bigger project, this approach to handle OOV has some potential:
https://github.com/alumae/kaldi-offline-transcriber/blob/master/local/get_ctm_unk.sh
https://deepai.org/publication/advanced-rich-transcription-system-for-estonian-speech
it will require G2P model and some hacks though.