Adding one custom word to be recognized

alphacep / vosk-android-demo

Offline speech recognition for Android with Vosk library.

Apache License 2.0

734 stars 193 forks source link

Adding one custom word to be recognized #138

Open DmitriySosnovskiy opened 3 years ago

DmitriySosnovskiy commented 3 years ago

Hello. I'm developing an Android app and I want to add one custom word to the vocabulary list. I'm using light russian model and I do it this way: Recognizer recognizer = new Recognizer(model, 16000.f, "[\"глазар\"]"); Where 'глазар' is the custom word I'd like to add. However, these are the logs: W/VoskAPI: KaldiRecognizer():kaldi_recognizer.cc:89) Ignoring word missing in vocabulary: 'глазар'

I've read manuals on website and checked previous issues, and as I see the only way to solve my problem is to teach the custom model via Kaldi toolkit with prepared 1h voice records for training.

Anyway, I'd like to ask you also, maybe there is a simplier way to do this, without training new model?

abouquet commented 3 years ago

Hello,

I had the same feature request months ago. I don't have enought resource to enter into a heavy training step. To achieve that, I've added to this engine a notion of "Alignment" capability.

It can be easy to setup. Just add an extra step after onFinalResult with a map. as a dictionary input. Try to recognize the unknown word, the engine will try to match the closest word he knows. Transform this word by passing onFinalResult output to the dictionary map and get the output you have set.

And voilà, it's done.

Note, that it can be only used on known grammar context (like finished list of keywords) to not alter the overall quality of recognition.

LuggerMan commented 3 years ago

Hello, @DmitriySosnovskiy, the easiest way is -> https://chrisearch.wordpress.com/2017/03/11/speech-recognition-using-kaldi-extending-and-using-the-aspire-model/ (at least you wont need a CUDA cluster lmao)

In Russian you can skip the pronounciation model (look at scripts in big Russian model) and probably just add words with a static probability to avoid ngram-count