geaxgx / depthai_handface

Running Google Mediapipe Face Mesh and Hand Tracking models on Luxonis DepthAI devices
MIT License
60 stars 10 forks source link

Face and hand tracking with DepthAI

Running Google Mediapipe Face Mesh and Hand Tracking models on Luxonis DepthAI hardware (OAK-D, OAK-D lite, OAK-1,...). The tracking is limited to one face and two hands. The hand tracking is optionnal and can be disabled by setting the argument nb_hands to 0.

Demo

The models used in this repository are:

The original models are tflite models, they all have been converted to onnx with PINTO's tflite2tensorflow. Note also that, whenever possible, the post-processing of the models output has been integrated/concatenated to the models themselves, thanks to PINTO's simple-onnx-processing-tools. Thus, Non Maximum Suppression for the face detection and palm detection models as well as some calculation with the 468 or 478 face landmarks are done at the level of the models. The alternative would have been to do these calculations on the host or in a script node on the device (slower).

Install

Install the python packages (depthai, opencv) with the following command:

python3 -m pip install -r requirements.txt

Run

Usage:

->./demo.py -h
usage: demo.py [-h] [-i INPUT] [-a] [-p] [-2] [-n {0,1,2}] [-xyz]
               [-f INTERNAL_FPS]
               [--internal_frame_height INTERNAL_FRAME_HEIGHT] [-t [TRACE]]
               [-o OUTPUT]

optional arguments:
  -h, --help            show this help message and exit

Tracker arguments:
  -i INPUT, --input INPUT
                        Path to video or image file to use as input (if not
                        specified, use OAK color camera)
  -a, --with_attention  Use face landmark with attention model
  -p, --use_face_pose   Calculate the face pose tranformation matrix and
                        metric landmarks
  -2, --double_face     EXPERIMENTAL. Run a 2nd occurence of the face landmark
                        Neural Network to improve fps. Hand tracking is
                        disabled.
  -n {0,1,2}, --nb_hands {0,1,2}
                        Number of hands tracked (default=2)
  -xyz, --xyz           Enable spatial location measure of hands and face
  -f INTERNAL_FPS, --internal_fps INTERNAL_FPS
                        Fps of internal color camera. Too high value lower NN
                        fps (default= depends on the model)
  --internal_frame_height INTERNAL_FRAME_HEIGHT
                        Internal color camera frame height in pixels
  -t [TRACE], --trace [TRACE]
                        Print some debug infos. The type of info depends on
                        the optional argument.

Renderer arguments:
  -o OUTPUT, --output OUTPUT
                        Path to output video file

Some examples:

Keypress Function
Esc Exit
space Pause
1 Show/hide the rotated bounding box around the hand
2 Show/hide the hand landmarks
3 Show/hide the rotated bounding box around the face
4 Show/hide the face landmarks
5 Show/hide hand spatial location (-xyz)
6 Show/hide the zone used to measure the spatial location (small purple square) (-xyz)
g Show recognized hand gesture (--gesture)
f Switch between several face landmark rendering
m Switch between several face metric landmark rendering (-p)
p Switch between several face pose rendering (-p)
s Apply smoothing filter on metric landmarks (-p)
h Switch between several hand landmark rendering
b Draw the landmarks on a black background

Face landmarks

Click on the image below to visualize the 468 landmarks (source).

Face landmarks

Hand landmarks

You can find a description there.

Code

The code relies on 2 classes:

A typical usage scenario:

tracker = HandFaceTracker(...)

renderer = HandFaceRenderer(...)

while True:
    frame, faces, hands = tracker.next_frame()
    if frame is None: break
    # Draw face and hands
    frame = renderer.draw(frame, faces, hands)
    key = renderer.waitKey(delay=1)
    if key == ord('q'):
        break

Let's focus on:

frame, faces, hands = tracker.next_frame()

The classes Face and HandRegion are described in mediapipe_utils.py.

The schema below describes the two default types of face landmarks: landmarks and norm_landmarks (in the Face class):

Face landmarks

By using the -p or --use_face_pose argument, a third type of landmarks becomes available: metric_landmarks. They correspond to the 3D runtime face metric landmarks (unit=cm) aligned with the canonical metric face landmarks. In the figure below, the metric landmarks are drawn on the right side. The axis of the C.S. in which the metric landmarks are represented are drawn on the left side. Note that the origin of the C.S. is inside the head:

Metric landmarks

Pipeline

A few explanations on the pipeline:

Examples

Blink detection Blink detection
Blender puppet Blender puppet

Credits