alphacep / vosk-api

Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
Apache License 2.0
7.57k stars 1.06k forks source link

Return list of words ignored because they are missing from the vocabulary #1314

Open tdrp opened 1 year ago

tdrp commented 1 year ago

If you use a grammar, you will get a bunch of messages like this one: UpdateGrammarFst():recognizer.cc:308) Ignoring word missing in vocabulary: 'snivel' UpdateGrammarFst():recognizer.cc:308) Ignoring word missing in vocabulary: 'classifier'

Is there currently some way to directly check which words are flagged as missing? Or should I modify UpdateGrammarFst, something like this?

while (getline(ss, token, ' ')) { int32 id = model_->wordsyms->Find(token); if (id == kNoSymbol) { KALDI_WARN << "Ignoring word missing in vocabulary: '" << token << "'"; missing_words.push_back(token); } else { sentence.push_back(id); } }

nshmyrev commented 1 year ago

Is there currently some way to directly check which words are flagged as missing?

there is

https://github.com/alphacep/vosk-api/blob/master/src/vosk_api.h#L74

bindings has this method too

tdrp commented 1 year ago

Thank you - it seems to be exposed in the "kotlin" folder but not in any of the others.