NaomiProject / Naomi

The Naomi Project is an open source, technology agnostic platform for developing always-on, voice-controlled applications!
https://projectnaomi.com/
MIT License
242 stars 47 forks source link

Synchronous and Asynchronous Mic #402

Closed aaronchantrill closed 4 months ago

aaronchantrill commented 4 months ago

Description

I have turned the Mic class into an abstract class and used it to create two new classes, MicSynchronous and MicAsynchronous. I'm hoping to expand it to all the different mic classes, including the local (text) mic and the batch mic.

I'm attempting to support both active listen mode (where the computer only starts listening for a command after hearing its wakeword - Siri-like mode) and passive listen mode (where the computer records blocks of audio, then checks for the wake word and then checks the same block of audio for a command - Echo-like).

Right now, I am having trouble with the expect function when using passive listen mode with the asynchronous listener. This has to do with the pyaudio device play_file, which returns when it has finished writing to the queue, but before the audio is done playing. This leads to situations where the next audio starts getting queued before the last audio finishes playing. If the audio's have different frame sizes, this leads to a segmentation fault.

I have been testing by using the "knock knock joke" and "time" speechhandler plugins. Knock-knock joke uses expect quite a bit. I have been using Pocketsphinx_KWS for my passive STT engine, Pocketsphinx for my special STT engine and VOSK (which is available here: https://github.com/aaronchantrill/Naomi_VOSK_STT) as my active STT engine. VOSK works well, at least in English, but requires some additional training if you have non-standard words in your vocabulary. I'd like to make VOSK officially available through NPE but the last time I trained VOSK to recognize some additional words, it required a computer with 32GiB of ram. I will test on my Raspberry Pi 5 with 8 GiB and see if it can handle it, but have low expectations. I would like to add an option to export the Naomi vocabulary so VOSK can be trained on another computer, as it does run well on the Raspberry Pi 4 and 5.

Related Issue

Naomi does not listen while thinking #340

Motivation and Context

The microphone does not currently continue to collect audio while Naomi is processing. This is especially a problem when entering a room, as the VAD still often captures noises as audio to process. If you walk into the room and then address Naomi while it is processing the audio of you walking into the room, it will miss your request.

How Has This Been Tested?

I have tested with both "listen while talking=True" (asynchronous) and "listen while talking=False" (synchronous) modes. I have tested with both "passive_listen=True" (passive listening) and "passive_listen=False" (active listen) modes I have been testing by asking Naomi to tell me a knock-knock joke (which uses the "expect" method) and then either allowing it to finish the joke, or asking it to tell me the time before it completes the joke: User: Tel me a knock knock joke Naomi: Knock knock User: Naomi, what time is it? Naomi: It is 12:15 PM right now

Screenshots (if appropriate):

Types of changes

Checklist:

aaronchantrill commented 4 months ago

One thing I'm not real happy with is having the listen() and active_listen() methods returning both the transcription and the audio itself. This would be a breaking change, although it probably needs to happen since I also want to add the speaker's identity and may come up with additional needs moving forward. I am planning to create a new Utterance class that will contain additional meta-information that will be made available. I'll define a default property so that referencing the utterance object directly will return the transcription, which should make it work with plugins that call listen() expecting a string.

aaronchantrill commented 4 months ago

I think this is ready to go now. If anyone is interested, please try it. Let me know if you encounter any issues. If not, I will merge it in a week.