NaomiProject / Naomi

The Naomi Project is an open source, technology agnostic platform for developing always-on, voice-controlled applications!
https://projectnaomi.com/
MIT License
242 stars 47 forks source link

New pocketsphinx #373

Closed aaronchantrill closed 1 year ago

aaronchantrill commented 1 year ago

Description

This speeds up the deployment of Naomi by using apt and pip packages to install pocketsphinx and phonetisaurus rather than compiling them from source. It also replaces some tools used from CMUCLMTK and MITLM with custom python scripts and KenLM, which does still have to be build from source but takes a lot less time.

I am also now using the Phonetisaurus python library rather than calling phonetisaurus executables using subprocess.

Related Issue

Switch to pypi wrapper for phonetisaurus #346

Motivation and Context

I wanted to make it easier for a casual user to download and install Naomi to get a sense of how it works.

How Has This Been Tested?

I have tested by running Naomi after manually installing on a virtualbox virtual Debian Bullseye machine using this method. My edits appear to have broken some of the unittests, so I am working on fixing them. I will install on a raspberry pi 4 for testing.

Screenshots (if appropriate):

Types of changes

Checklist:

CLAassistant commented 1 year ago

CLA assistant check
All committers have signed the CLA.

aaronchantrill commented 1 year ago

Yep, I've messed something up with the unittests, and it appears to be connected to the use of Mock(), which I don't really understand. The plugin appears to work, although case sensitivity messes up the wake word right now.

aaronchantrill commented 1 year ago

I fixed the problems with the unittests and in the process learned about how to use Mock objects. In testing, this works remarkably well so far. It's amazing how much better the comprehension is just by switching to the KenLM language model. I think the code is also easier to follow now. I still need to go through the whole process of deploying on a Raspberry Pi with a fresh Raspberry Pi OS as a test, but it is looking good so far.

aaronchantrill commented 1 year ago

I have been testing this on a Libre Renegade. I discovered that after asking Naomi to tell a joke, it would start generating "Passive transcription failed" errors. I don't think this is related to the changes I just made. It is most likely something that has been around for a while but only shows up if you are using pocketsphinx for both the passive and special processing engines, but I have been using pocketsphinx_kws for passive listening and either Deepspeech or VOSK for active listening.

The underlying error that is generating the "Passive transcription failed" error is not much help. It is just "Failed to start utterance processing". There does not appear to be any conflict when using pocketsphinx for both the active and passive STT engines, and the engine itself should not know which application it is being used for.

I created a reinit function that is called when the transcribe method generates a RuntimeError, which seems to be working for now, but I'd still like to know why creating a new special listener breaks the current passive listener.