NaomiProject / Naomi

The Naomi Project is an open source, technology agnostic platform for developing always-on, voice-controlled applications!
https://projectnaomi.com/
MIT License
241 stars 47 forks source link

VOSK STT Engine #280

Closed aaronchantrill closed 5 months ago

aaronchantrill commented 4 years ago

Detailed Description

VOSK (https://alphacephei.com/vosk/) is a new open-source STT toolkit/engine built on Kaldi and which is optimized to run on Raspberry Pi. Building a language model is described here https://alphacephei.com/vosk/adaptation.html

Context

Learning to train and adapt the acoustic model, language model and dictionary is enormously helpful in speech recognition. The more you can reduce the total range of probabilities, the better the recognition becomes. Naomi has an advantage in that we have a list of phrases that can be used to build a language model directly from.

Possible Implementation

VOSK can be installed with a simple pip3 install vosk. The training tools are basically Kaldi, but it is not necessary to install Kaldi to use VOSK. The adaptation page shows a good start on developing a language model from the intent templates.

aaronchantrill commented 1 year ago

I am working on this, and the reliability of VOSK is pretty amazing. It is also pretty lightweight and easy to install. I am currently trying to adapt the Language Model using the instructions at https://alphacephei.com/vosk/lm. From what I'm understanding right now, I need to convert all the speechhandler intent templates into a JSGF file, then use that to generate an ARPA statistical model, then interpolate that with the default VOSK language model. Phonetisaurus works for generating a custom dictionary, and the VOSK compile model (https://alphacephei.com/vosk/models/vosk-model-en-us-0.22-compile.zip) comes with a pre-trained fst file to use with Phonetisaurus for generating new pronunciations.

Akul2010 commented 1 year ago

I found this github link for making a custom model: https://github.com/matteo-39/vosk-build-model

aaronchantrill commented 6 months ago

@Akul2010 Sorry, I meant to get back to you earlier. That is a very interesting set of instructions for building a VOSK model, but overkill for anything we'd be doing. Using those instructions, you could add a whole new language to VOSK, which is awesome.

We just need to customize the Gr.fst and HCLr.fst with custom words and phrases. The process is described here: https://alphacephei.com/vosk/lm and supports English, French, German, and Russian and is pretty straightforward, but it requires installing both Kaldi and SRILM. Kaldi is usually pretty easy to install, although the last time I installed on a new Bookworm system I had to trick the installer into thinking that I had Python 2.7 installed since it still thinks it needs it for the install process, but it is no longer available through my package manager. I see some discussion that Python 2.7 was really only required for Pocketsphinx, which has also updated to Python 3, so hopefully Kaldi will drop that requirement soon. SRILM is open-source and available for academic and government use but is not freely available. You have to register an account to download it. On my Raspberry Pi I had to trick it into compiling on aarch64 by modifying make files as described here: https://github.com/G10DRAS/SRILM-on-RaspberryPi

There are other, free-er libraries that can be used instead of SRILM, including KenLM which is very lightweight and we are already using for building language models for Pocketsphinx (although with a much smaller vocabulary). I'm not sure about the process of converting a language model file to fst format, though.

Overall, the process of getting the Raspberry Pi set up is not simple, but once you have it set up all you have to do is drop your vocabulary into the db/extra.txt file, then run compile-graph.sh and wait for it to finish so you can pick up your new vocabulary G.fst and HCL.fst files from exp/chain/tdnn/graph.

The last time I tried this, I tried it on a couple of computers and kept running into memory issues. I finally got it working under WSL on a Windows machine with 32 GiB of ram. I'm getting ready to try again with my 8GiB Raspberry Pi 5.

aaronchantrill commented 6 months ago

The Raspberry Pi 5 was able to do it! It did cut off all communication for a little while and I'm not sure how long it took, but it was able to build a HCLG.fst file which I am using now and does recognize my custom vocabulary.

Akul2010 commented 6 months ago

Great! Do you plan on making it available in maybe the next few builds on Naomi?

aaronchantrill commented 6 months ago

@Akul2010 I think what makes sense is just to write up the steps required for generating a custom vocabulary for now and put a check in place that notifies you if there are any words in the current "languagemodel" file (ie, ~/.config/naomi/vocabularies/en-US/VOSK STT/default/languagemodel) that do not also appear in the vosk words.txt (ie, ~/.config/naomi/vosk/vosk-model-en-us-0.22-lgraph/graph/words.txt) file.

aaronchantrill commented 6 months ago

It would be good to see if we can use KenLM to generate a language model and then convert that to an HCLG.fst file. I'm really not comfortable requiring people to go out and register with SRI so they can download a copy of SRILM.

aaronchantrill commented 6 months ago

It is currently available from https://github.com/aaronchantrill/Naomi_VOSK_STT but I haven't added it to NPE yet because it's so difficult to customize the vocabulary. I think if I add the check to warn the user if the vocabulary they are using uses any words that Vosk does not currently know with a link to a detailed description of how to generate a custom Vosk vocabulary, that will be enough for me to feel good about adding it to the NPE.

aaronchantrill commented 5 months ago

@Akul2010 I have updated Naomi_VOSK_STT plugin at https://github.com/aaronchantrill/Naomi_VOSK_STT - it still doesn't do the language model adaptation automatically, but it does give you warnings if there are any words in your vocabulary that it doesn't know. I added a credit at the bottom for you since we never managed to get your pull request merged. Thanks! I'll be submitting this plugin to NPE later today, and will be recording a new "How to install Naomi" video soon.

Akul2010 commented 5 months ago

Great! Thank you!