MrBuddyCasino / ESP32_Alexa

An Alexa Smart Speaker project for the ESP32.
Mozilla Public License 2.0
265 stars 72 forks source link

Improvement Suggestion - Wake on Voice #5

Closed ibaranov-cp closed 6 years ago

ibaranov-cp commented 7 years ago

Hey! This is totally awesome!

Have you thought of using something like this to accomplish voice wakeup?

https://www.digikey.ca/product-detail/en/pui-audio-inc/PMM-3738-VM1010-R/668-1585-1-ND/7346070

Would it be possible to detect Alexa's standard "I didn't get that" response, and suppress it, thereby ensuring it didn't go off if I made a loud noise or similar?

Would you mind if I make a dev kit using this as well? I could upload the schematic and PCB layout to Circuitmaker (free) and pull request it here?

Cheers! :D

MrBuddyCasino commented 7 years ago

Thanks! Though the code is not nearly as polished as I would have liked.

The PMM-3738-VM1010-R is a mic that can give a wakeup signal on loud noises, right? Since we're not energy constrained, we could do that entirely in software.

But I think we can do better, 0xPIT mentioned this library: https://github.com/arjo129/uSpeech It needs minimal RAM, we'll have to see how accurate it is, but it might be a good start.

There is a feature of the Alexa API to verify a wakeword in order to increase accuracy, but for that we'd have to store the audio data for re-sending it, not sure if we have enough RAM for that. Might also be hard to pull off.

ibaranov-cp commented 7 years ago

Hmm, I've had a dream of embedding Alexa all over the house in the light switches (every room has em, controlling lights is what most people want, large enough to support a small speaker and Mic)

Would it then make more sense to stream audio from each device down to one centeral raspi or similar? Or better each node does a request? (allows for simultaneous connection, cheaper, etc)

Email me offline ibaranov@clearpathrobotics.com if you want to persue this idea further :)

Meanwhile, are you cool with a devkit schematic based on your design ideas here?

MrBuddyCasino commented 7 years ago

To be honest I'm not sure if this can be used for serious stuff. The Alexa smart speaker hardware is quite sophisticated, with many microphones, dsp, beamforming and such. Also, the ESP32 needs a very good wifi connection to stream audio reliably, due to the small buffers. If you're looking for something practical, I think you're better of with a Raspberry Pie Zero W and the Java reference implementation.

Meanwhile, are you cool with a devkit schematic based on your design ideas here?

Yeah of course, thanks for asking though. Thought about designing a board myself, so that would save me a lot of effort. If you use the ESP32 module's PCB antenna, make sure to let it go beyond the board, because otherwise RF performance will suffer, especially if theres a ground layer.

martinbradford commented 7 years ago

Espressif have been working on wake word detection on the ESP32 since early this year – there is a video on YouTube demonstrating their early attempts. I contacted them to ask for sight of the code and they told me that they were not satisfied with the current performance, but were hoping to have something that they are happy to release soon.

Martin

From: Michael Böckling [mailto:notifications@github.com] Sent: 16 August 2017 22:03 To: MrBuddyCasino/ESP32_Alexa ESP32_Alexa@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: Re: [MrBuddyCasino/ESP32_Alexa] Improvement Suggestion - Wake on Voice (#5)

Thanks! Though the code is not nearly as polished as I would have liked.

The PMM-3738-VM1010-R is an mic that can give a wakeup signal on loud noises, right? since we're not energy constrained, we could do that entirely in software.

But I think we can do better, 0xPIT mentioned this library: https://github.com/arjo129/uSpeech It needs minimal RAM, we'll have to see how accurate it is, but it might be a good start.

There is a feature of the Alexa API to verify a wakeword in order to increase accuracy, but for that we'd have to store the audio data for re-sending it, not sure if we have enough RAM for that. Might also be complicated.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/MrBuddyCasino/ESP32_Alexa/issues/5#issuecomment-322898542, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AdeB1Z1aZHmItdgDW8SnSoKzVK8cRumRks5sY1kYgaJpZM4O5WBS.

martinbradford commented 7 years ago

Espressif have been working on wake word detection on the ESP32 since early this year – there is a video on YouTube demonstrating their early attempts. I contacted them to ask for sight of the code and they told me that they were not satisfied with the current performance, but were hoping to have something that they are happy to release soon.

Martin

MrBuddyCasino commented 7 years ago

I have high opinions of Espressif developers, but they tend to be very optimistic regarding their release schedule. ^_^

martinbradford commented 7 years ago

Their video shows it working quite well in Chinese back in January - but the algorithm they were using apparently did not cope so well with English!

ibaranov-cp commented 7 years ago

You are almost certainly right, far field audio is complex and currently expensive :( http://conexant.com/amazon-avs/ds20924/

For sure, will do, probably a few weeks.

martinbradford commented 7 years ago

A genuine Alexa sells for £50 and includes a pretty impressive far field mic plus a reasonable amount of processing power – so it must be possible to get the costs under control…

Martin

From: Ilia Baranov [mailto:notifications@github.com] Sent: 18 August 2017 19:10 To: MrBuddyCasino/ESP32_Alexa ESP32_Alexa@noreply.github.com Cc: martinbradford martin.a.bradford@hotmail.co.uk; Comment comment@noreply.github.com Subject: Re: [MrBuddyCasino/ESP32_Alexa] Improvement Suggestion - Wake on Voice (#5)

You are almost certainly right, far field audio is complex and currently expensive :( http://conexant.com/amazon-avs/ds20924/

For sure, will do, probably a few weeks.

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/MrBuddyCasino/ESP32_Alexa/issues/5#issuecomment-323423372, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AdeB1abcuZvvxNETY1_0TYLmaadewoHRks5sZdN9gaJpZM4O5WBS.

ibaranov-cp commented 7 years ago

For sure, in volume of 1000s ;) The central chip used in the dev kit, CX20924, is $15 on Arrow in quantities of 1. So would be ~$30 all told for the parts. Still fairly pricey though...

MrBuddyCasino commented 7 years ago

The devkit uses AKU242 PDM microphones, interesting. Never heard of them. Seems they belong to Bosch, which are known to produce great sensors: http://www.akustica.com/Files/Admin/PDFs/Product%20Briefs/PB24-1.0%20-%20AKU242%20Product%20Brief.pdf

MrBuddyCasino commented 7 years ago

@ibaranov-cp getting the azimuth of an audio source might be doable in software. See srpphat.cpp in https://github.com/e3e-monitor/e3e-detection-2016/tree/master/src.

martinbradford commented 7 years ago

The performance of the ESP32 Alexa with a single Adafruit I2S microphone is pretty impressive actually. We were working from our boat last weekend and testing a bread-boarded lash-up - my wife stood on the far side of the pontoon and asked "what's the time?" in a natural voice and Alexa understood her.

What we need is wake-word detection - and that is much more a case of local processing, not the quality of the microphone.