Open goeddea opened 5 years ago
related/kinda superset of https://github.com/crossbario/iotcookbook/issues/27
Let me add some infos and background from my side, in particular regarding the ML stuff.
In ML software is often split into 2 pieces: a) training/learning and b) detection/run-time.
"face detection component": this would be the b) in above. It needs to load/access an already trained model, and only apply that model to new incoming data and output prediction ("Is this picture/video frame a human face, yes or no?")
so we actually should have 2 components for the ML part:
for the demo, pattern == human face is perfect. but we should design the components in a way that generalizes to pattern (see below)
The specific ML algorithm that we should use for this is "Haar cascades". A good intro can be found here: http://www.willberger.org/cascade-haar-explained/
The output (when using OpenCV for haar cascades via cv2.CascadeClassifier
) is exactly 1 XML file == trained model.
The OpenCV project provides a bunch of ready-to-use trained models here: https://github.com/opencv/opencv/blob/master/data/haarcascades/
One model provided is haarcascade_frontalface_default.xml
, which is a haar cascade model trained to detect human faces.
Being XML, it is verbose, and can be compressed 10x: https://gist.github.com/oberstet/5f91645cb6d4497676b8cca7b83d12e5
The training component (different from the run-time component) essentially needs to do:
The detection run-time component (processing the live video frames) needs to do:
We could for example have WAMP procs in the ML run-time component:
store_model(compressed_xml, label, description) -> UUID (= SHA fingerprint of XML)
: store the xml locally within the run-time component diskload_model(UUID) -> ok|error
: load a previously stored model - only works if no model is currently runningrun_model() -> ok|error
: start the previously loaded model, will begin to process live video frames (received from the camera capture component)stop_model() -> ok|error
: stop the currently running model (if any)list_models() -> [{UUID, label, descripton}]
And then eg have the ML training component call into store_model
etc etc
some more links:
the reason for above generalization (pattern vs only faces plus run-time and training component) is: doing so makes this actually much more than a demo!
eg we could add further down the line UI that allows an end user to upload and define training sets of arbitrary pictures/images for other application:
because: face detection is obviously not sth an industrial use would practically do. however, "broken parts vs ok parts" is actually very very relevant
For the initial version, which can use the existing model that Open CV provides
We want an application which provides detection of human faces in a live video stream and can show this in a browser-based frontend.
This is microservice-based, and I see three components as part of this:
Interaction:
With this, we can use the video display component twice, once to show the raw image stream, once to show the face positions as part of our demo.
My initial (naive) assumption regarding coordination between the two data streams is that this is via a timecode generated by the camera capture component, which is then also used for the face detection data stream. The video display component can then cache either until the required pairs are present.
@om26er - does the above sound reasonable?