crossbario / iotcookbook

Crossbar.io IoT Cookbook
MIT License
17 stars 6 forks source link

create face detection demo app #29

Open goeddea opened 5 years ago

goeddea commented 5 years ago

We want an application which provides detection of human faces in a live video stream and can show this in a browser-based frontend.

This is microservice-based, and I see three components as part of this:

Interaction:

With this, we can use the video display component twice, once to show the raw image stream, once to show the face positions as part of our demo.

My initial (naive) assumption regarding coordination between the two data streams is that this is via a timecode generated by the camera capture component, which is then also used for the face detection data stream. The video display component can then cache either until the required pairs are present.

@om26er - does the above sound reasonable?

goeddea commented 5 years ago

related/kinda superset of https://github.com/crossbario/iotcookbook/issues/27

oberstet commented 5 years ago

Let me add some infos and background from my side, in particular regarding the ML stuff.

In ML software is often split into 2 pieces: a) training/learning and b) detection/run-time.

"face detection component": this would be the b) in above. It needs to load/access an already trained model, and only apply that model to new incoming data and output prediction ("Is this picture/video frame a human face, yes or no?")

so we actually should have 2 components for the ML part:

for the demo, pattern == human face is perfect. but we should design the components in a way that generalizes to pattern (see below)


The specific ML algorithm that we should use for this is "Haar cascades". A good intro can be found here: http://www.willberger.org/cascade-haar-explained/

The output (when using OpenCV for haar cascades via cv2.CascadeClassifier) is exactly 1 XML file == trained model.

The OpenCV project provides a bunch of ready-to-use trained models here: https://github.com/opencv/opencv/blob/master/data/haarcascades/

One model provided is haarcascade_frontalface_default.xml, which is a haar cascade model trained to detect human faces.

Being XML, it is verbose, and can be compressed 10x: https://gist.github.com/oberstet/5f91645cb6d4497676b8cca7b83d12e5


The training component (different from the run-time component) essentially needs to do:

The detection run-time component (processing the live video frames) needs to do:

We could for example have WAMP procs in the ML run-time component:

  1. store_model(compressed_xml, label, description) -> UUID (= SHA fingerprint of XML): store the xml locally within the run-time component disk
  2. load_model(UUID) -> ok|error: load a previously stored model - only works if no model is currently running
  3. run_model() -> ok|error: start the previously loaded model, will begin to process live video frames (received from the camera capture component)
  4. stop_model() -> ok|error: stop the currently running model (if any)
  5. list_models() -> [{UUID, label, descripton}]

And then eg have the ML training component call into store_model etc etc


some more links:

oberstet commented 5 years ago

the reason for above generalization (pattern vs only faces plus run-time and training component) is: doing so makes this actually much more than a demo!

eg we could add further down the line UI that allows an end user to upload and define training sets of arbitrary pictures/images for other application:

because: face detection is obviously not sth an industrial use would practically do. however, "broken parts vs ok parts" is actually very very relevant

goeddea commented 5 years ago

For the initial version, which can use the existing model that Open CV provides