Running Google Mediapipe single body pose tracking models on DepthAI hardware (OAK-1, OAK-D, ...).
The Blazepose landmark models available in this repository are the version "full", "lite" and "heavy" of mediapipe 0.8.6 (2021/07),
The pose detection model comes from mediapipe 0.8.4 and is compatible with the 3 landmark models (the 0.8.6 version currently cannot be converted into a Myriad blob).
For the challenger Movenet on DepthAI, please visit : depthai_movenet
For an OpenVINO version of Blazepose, please visit : openvino_blazepose
Two modes are available:
Landmark model (Edge mode) | FPS (FPS with 'xyz' option) |
---|---|
Full | 20 (18) |
Lite | 26 (22) |
Heavy | 8 (7) |
For depth-capable devices, when measuring the 3D location of a reference point, more nodes are used and not represented here (2 mono cameras, stereo node, spatial location calculator).
Note : the Edge mode schema is missing a custom NeuralNetwork node between the ImageManip node on the right and the landmark NeuralNetwork. This custom NeuralNetwork runs a very simple model that normalize (divide by 255) the output image from the ImageManip node. This is a temporary fix, should be removed when depthai ImageManip node will support setFrameType(RGBF16F16F16p).
The image below demonstrates the 3 modes of 3D visualization: 1) Image mode (top-right), based on body.landmarks. Note that the size of the drawn skeleton depends on the distance camera-body, but that the mid hips reference point is restricted and can only moved inside a plane parallel to the wall grid; 2) World mode (bottom-left), based on body.world_landmarks. Note the mid hips reference point is fixed and the size of the skeleton does not change; 3) Mixed mode (bottom right), mixing body.world_landmarks with measured 3D location of the reference point. Like in World mode, the size of the skeleton does not change. But the mid hips reference point is not restricted any more.
Install the python packages (depthai, opencv, open3d) with the following command:
python3 -m pip install -r requirements.txt
Usage:
-> python3 demo.py -h
usage: demo.py [-h] [-e] [-i INPUT] [--pd_m PD_M] [--lm_m LM_M] [-xyz] [-c]
[--no_smoothing] [-f INTERNAL_FPS]
[--internal_frame_height INTERNAL_FRAME_HEIGHT] [-s] [-t]
[--force_detection] [-3 {None,image,mixed,world}]
[-o OUTPUT]
optional arguments:
-h, --help show this help message and exit
-e, --edge Use Edge mode (postprocessing runs on the device)
Tracker arguments:
-i INPUT, --input INPUT
'rgb' or 'rgb_laconic' or path to video/image file to
use as input (default=rgb)
--pd_m PD_M Path to an .blob file for pose detection model
--lm_m LM_M Landmark model ('full' or 'lite' or 'heavy') or path
to an .blob file
-xyz, --xyz Get (x,y,z) coords of reference body keypoint in
camera coord system (only for compatible devices)
-c, --crop Center crop frames to a square shape before feeding
pose detection model
--no_smoothing Disable smoothing filter
-f INTERNAL_FPS, --internal_fps INTERNAL_FPS
Fps of internal color camera. Too high value lower NN
fps (default= depends on the model)
--internal_frame_height INTERNAL_FRAME_HEIGHT
Internal color camera frame height in pixels
(default=640)
-s, --stats Print some statistics at exit
-t, --trace Print some debug messages
--force_detection Force person detection on every frame (never use
landmarks from previous frame to determine ROI)
Renderer arguments:
-3 {None,image,mixed,world}, --show_3d {None,image,mixed,world}
Display skeleton in 3d in a separate window. See
README for description.
-o OUTPUT, --output OUTPUT
Path to output video file
Examples :
To use default internal color camera as input with the model "full" in Host mode:
python3 demo.py
To use default internal color camera as input with the model "full" in Edge mode [preferred]:
python3 demo.py -e
To use a file (video or image) as input :
python3 demo.py -i filename
To use the model "lite" :
python3 demo.py -lm_m lite
To measure body spatial location in camera coordinate system (only for depth-capable device like OAK-D):
python3 demo.py -e -xyz
The measure is made only on one reference point:
To show the skeleton in 3D 'world' mode (-xyz flag needed):
python3 demo.py -e -xyz -3 world
Note that the floor and wall grids does not correspond to a real floor and wall. Each grid square size is 1m x 1m.
When using the internal camera, to change its FPS to 15 :
python3 demo.py --internal_fps 15
Note: by default, the default internal camera FPS depends on the model, the mode (Edge vs Host), the use of depth ("-xyz"). These default values are based on my own observations. Please, don't hesitate to play with this parameter to find the optimal value. If you observe that your FPS is well below the default value, you should lower the FPS with this option until the set FPS is just above the observed FPS.
When using the internal camera, you probably don't need to work with the full resolution. You can set a lower resolution (and win a bit of FPS) by using this option:
python3 demo.py --internal_frame_size 450
Note: currently, depthai supports only some possible values for this argument. The value you specify will be replaced by the closest possible value (here 432 instead of 450).
By default, temporal filters smooth the landmark positions. Use --no_smoothing to disable the filter.
Keypress in OpenCV window | Function |
---|---|
Esc | Exit |
space | Pause |
r | Show/hide the bounding rotated rectangle around the body |
l | Show/hide landmarks |
s | Show/hide landmark score |
f | Show/hide FPS |
x | Show/hide (x,y,z) coordinates (only on depth-capable devices and if using "-xyz" flag) |
z | Show/hide the square zone used to measure depth (only on depth-capable devices and if using "-xyz" flag) |
If using a 3D visualization mode ("-3" or "--show_3d"): | Keypress in Open3d window | Function |
---|---|---|
o | Oscillating (rotating back and forth) of the view | |
r | Continuous rotating of the view | |
s | Stop oscillating or rotating | |
Up | Increasing rotating or oscillating speed | |
Down | Decreasing rotating or oscillating speed | |
Right or Left | Change the point of view to a predefined position | |
Mouse | Freely change the point of view |
You can directly find the model files (.xml and .bin) under the 'models' directory. Below I describe how to get the files in case you need to regenerate the models.
1) Clone this github repository in a local directory (DEST_DIR) 2) In DESTDIR/models directory, download the tflite models from [this archive](https://drive.google.com/file/d/1bEL4zmh2PEFsRfmFOofP0rbGNo-O1p5/view?usp=sharing). The archive contains:
Full, Lite anf Hevay pose landmark modelfrom Mediapipe 0.8.6.
Note: the Pose detection model from Mediapipe 0.8.6 can't currently be converted (more info here).
3) Install the amazing PINTO's tflite2tensorflow tool. Use the docker installation which includes many packages including a recent version of Openvino.
3) From DEST_DIR, run the tflite2tensorflow container: ./docker_tflite2tensorflow.sh
4) From the running container:
cd workdir/models
./convert_models.sh
The convert_models.sh converts the tflite models in tensorflow (.pb), then converts the pb file into Openvino IR format (.xml and .bin), and finally converts the IR files in MyriadX format (.blob).
5) By default, the number of SHAVES associated with the blob files is 4. In case you want to generate new blobs with different number of shaves, you can use the script gen_blob_shave.sh:
# Example: to generate blobs for 6 shaves
./gen_blob_shave.sh -m pd -n 6 # will generate pose_detection_sh6.blob
./gen_blob_shave.sh -m full -n 6 # will generate pose_landmark_full_sh6.blob
Explanation about the Model Optimizer params :
--reverse_input_channels
converts BGR to RGB. --mean_values [127.5,127.5,127.5] --scale_values [127.5,127.5,127.5]
normalizes the frames between [-1, 1].--reverse_input_channels
, but unlike the detection model, we choose to do the normalization in the python code and not in the models (via --scale_values
). Indeed, we have observed a better accuracy with FP16 models when doing the normalization of the inputs outside of the models (a possible explanation).The custom_models
directory contains the code to build the following custom models:
The method used to build these models is well explained on the rahulrav's blog.
To facilitate reusability, the code is splitted in 2 classes:
# For Host mode:
from BlazeposeDepthai import BlazeposeDepthai
# For Edge mode:
from BlazeposeDepthaiEdge import BlazeposeDepthai
This way, you can replace the renderer from this repository and write and personalize your own renderer (for some projects, you may not even need a renderer).
The file demo.py
is a representative example of how to use these classes:
from BlazeposeDepthaiEdge import BlazeposeDepthai
from BlazeposeRenderer import BlazeposeRenderer
# The argparse stuff has been removed to keep only the important code
tracker = BlazeposeDepthai(input_src=args.input,
pd_model=args.pd_m,
lm_model=args.lm_m,
smoothing=not args.no_smoothing,
xyz=args.xyz,
crop=args.crop,
internal_fps=args.internal_fps,
internal_frame_height=args.internal_frame_height,
force_detection=args.force_detection,
stats=args.stats,
trace=args.trace)
renderer = BlazeposeRenderer(
pose,
show_3d=args.show_3d,
output=args.output)
while True:
# Run blazepose on next frame
frame, body = tracker.next_frame()
if frame is None: break
# Draw 2d skeleton
frame = renderer.draw(frame, body)
key = renderer.waitKey(delay=1)
if key == 27 or key == ord('q'):
break
renderer.exit()
tracker.exit()
For more information on:
BlazeposeDepthai
or BlazeposeDepthaiEdge
in BlazeposeDepthai.py
or BlazeposeDepthaiEdge.py
;Body
in mediapipe_utils.py
.Semaphore alphabet |