Closed leonardltk closed 2 years ago
If you only need to predict a few words, you should train a much more lightweight speech classification model such as matchboxnet. If you absolutely want to use QuartzNet, which is a speech recognition model, you can try many things such as finetuning on those specific words, use an LM which is trained primarily on variations of those words or you can try using a newer technique such as adapters to train on very small dataset of those words.
If you're using Riva, you can also use word boosting to try to force your model to predict those words. But know that speech recognition will never be as efficient or as accurate as specialized speech classification models by design
thanks @titu1994 for the feedback!
Yes definitely im thinking to try a lighter weight model next time. For now, im interested in how to perform the word boosting method that you mentioned, is there a tutorial here or resource u could point me to for that ?
the reason is because the sentences that i want to predict might be yes its me
no its not me
so the overall dictionary might be dynamic throughout my experimentation
Nemo does not support word boosting. Facebook Flashlight decoder supports it in the Riva framework, you can export Nemo QuartzNet model to use in that framework
Oh i see, thanks! In that case, perhaps we can have Word Boosting as a feature request ?
We won't be supporting that. It has a very messy c++ codebase dependency which we don't want to introduce to Nemo.
For word boosting, your other option is to use this decoder with nemo models: https://github.com/kensho-technologies/pyctcdecode
For a simple script like this:
It performs well However if i change the problem of just predicting fixed number of words, for example
yes, no
How should i augment the language model part of it ?