Can this be done? Cat detection via audio and camera on ESP32

gamename commented 3 months ago

Hi,

Emlearn for Micropython looks really cool and has lots of potential.

I have a simple project where I want to listen for a cat meowing, then look (via camera) for a cat. If both are true, I will initiate a text message to myself. I already have a version of this running on a Raspberry Pi with Tensorflow.

Now I want to create a version that runs on a microcontroller (specifically, the esp32-s3-eye).

Can this be done with emlearn for micropython do you think?

Thank you, -T

jonnor commented 3 months ago

Hi @gamename. Cool project! I think that this might be doable on ESP32 hardware, though it does expand a bit outside what emlearn-micropython supports at the moment.

The audio part I am quite interested in enabling. And we do have some pieces of that already. So I have opened up an issue for that feature in https://github.com/emlearn/emlearn-micropython/issues/6

The image part would probably be best done with an object detection system, or alternatively an image classifier. The best current object detection for MCU/ESP32 is probably FOMO from EdgeImpulse. https://docs.edgeimpulse.com/docs/edge-impulse-studio/learning-blocks/object-detection For image classification, a Convolutional Neural Network would be the way to go. For now, I decided against adding support for that in emlearn C library, as it is non-trivial and there are several other good open implementations (ref https://github.com/emlearn/emlearn/issues/16). However, the only one I know accessible from MicroPython is TFLite micro via this project: https://github.com/mocleiri/tensorflow-micropython-examples

Personally, I think that micropython bindings for NNoM (https://github.com/majianjia/nnom) could be great. It supports CNN and RNN models with simple C code. So in theory it would be possible to make a library that is just as easy to install as emlearn-micropython, with support for those models. There is no built-in serialization for models, though, so that would probably need to be invented. But I think that these things are outside the scope of emlearn project for the time being.

jonnor commented 3 months ago

I forgot. OpenMV also provides image classification using TFLite models, probably the most mature solution. https://docs.openmv.io/library/omv.tf.html

gamename commented 3 months ago

@jonnor

Thank you, sir. That's great guidance. Much appreciated (its a lot more than I got out of the Tensorflow folks :( )

jonnor commented 3 months ago

Closing since this has been answered. And other issue opened for the things that are in-scope.

emlearn / emlearn-micropython

Can this be done? Cat detection via audio and camera on ESP32 #5