C++ code repo for the ECCV 2016 demo, "Realtime Multiperson Pose Estimation", Zhe Cao, Shih-En Wei, Tomas Simon, Yaser Sheikh. Thanks Ginés Hidalgo Martínez for restructuring the code.
The full project repo includes matlab and python version, and training code.
This project is under the terms of the license.
# OPENCV_VERSION := 3
on the file Makefile.config.Ubuntu14.example
(for Ubuntu 14) and/or Makefile.config.Ubuntu16.example
(for Ubuntu 15 or 16). In addition, OpenCV 3 does not incorporate the opencv_contrib
module by default. Assuming you have manually installed it and you need to use it, append opencv_contrib
at the end of the line LIBRARIES += opencv_core opencv_highgui opencv_imgproc
in the Makefile
file.caffe
& rtpose.bin
+ download the required caffe models (script tested on Ubuntu 14.04 & 16.04, it uses all the available cores in your machine):**
chmod u+x install_caffe_and_cpm.sh
./install_caffe_and_cpm.sh
./build/examples/rtpose/rtpose.bin --video video_file.mp4
./build/examples/rtpose/rtpose.bin
--help
<--- It displays all the available options.
--video input.mp4
<--- Input video. If omitted, will use webcam.
--camera #
<--- Choose webcam number (default: 0).
--image_dir path_to_images/
<--- Run on all jpg, png, or bmp images in path_to_images/
. If omitted, will use webcam.
--write_frames path/
<--- Render images with this prefix: path/frame%06d.jpg
--write_json path/
<--- Output JSON file with joints with this prefix: path/frame%06d.json
--no_frame_drops
<--- Don't drop frames. Important for making offline results.
--no_display
<--- Don't open a display window. Useful if there's no X server.
--num_gpu 4
<--- Parallelize over this number of GPUs. Default is 1.
--num_scales 3 --scale_gap 0.15
<--- Use 3 scales, 1, (1-0.15), (1-0.15*2). Default is one scale=1.
(HD)
--net_resolution 656x368 --resolution 1280x720
(These are the default values.)
(VGA)
--net_resolution 496x368 --resolution 640x480
--logtostderr
<--- Log messages to standard error.
Run on a video vid.mp4
, render image frames as output/frame%06d.jpg
and output JSON files as output/frame%06d.json
, using 3 scales (1.00, 0.85, and 0.70), parallelized over 2 GPUs:
./build/examples/rtpose/rtpose.bin --video vid.mp4 --num_gpu 2 --no_frame_drops --write_frames output/ --write_json output/ --num_scales 3 --scale_gap 0.15
Each JSON file has a bodies
array of objects, where each object has an array joints
containing the joint locations and detection confidence formatted as x1,y1,c1,x2,y2,c2,...
, where c
is the confidence in [0,1].
{
"version":0.1,
"bodies":[
{"joints":[1114.15,160.396,0.846207,...]},
{"joints":[...]},
]
}
where the joint order of the COCO parts is: (see src/rtpose/modelDescriptorFactory.cpp )
part2name {
{0, "Nose"},
{1, "Neck"},
{2, "RShoulder"},
{3, "RElbow"},
{4, "RWrist"},
{5, "LShoulder"},
{6, "LElbow"},
{7, "LWrist"},
{8, "RHip"},
{9, "RKnee"},
{10, "RAnkle"},
{11, "LHip"},
{12, "LKnee"},
{13, "LAnkle"},
{14, "REye"},
{15, "LEye"},
{16, "REar"},
{17, "LEar"},
{18, "Bkg"},
}
We modified and added several Caffe files in include/caffe
and src/caffe
. In case you want to use your own Caffe distribution, these are the files we added and modified:
include/caffe
and src/caffe
: include/caffe/cpm
and src/caffe/cpm
.include/caffe
(search for // CPM extra code:
to find the modified code): data_transformer.hpp
.src/caffe
(search for // CPM extra code:
to find the modified code): data_transformer.cpp
, proto/caffe.proto
and util/blocking_queue.cpp
.README.md
.install_caffe_and_cpm.sh
, Makefile.config.Ubuntu14.example
(extracted from Makefile.config.example
) and Makefile.config.Ubuntu16.example
(extracted from Makefile.config.example
).model/
, examples/rtpose
, /include/rtpose
and /src/rtpose
.Makefile
.Makefile.config.example
, data/
, examples/
(do not delete examples/rtpose
) and models/
.We created a few Caffe layers (located in include/caffe/cpm/layers
and src/caffe/cpm/layers
):
num
x channels
x height_input
x width_input
), the layer returns a 4-D output of size (num
x channels
x height_output
x width_output
). It is independently applied to each dimension of num
and channels
. Its parameters are:
factor
: Scaling factor with respect to the input width and height. factor
is the alternative to the pair of variables [target_spatial_width
, target_spatial_height
]. If factor != 0
, the latter are ignored.scale_gap
and start_scale
: These parameters are related and used for doing scale search in testing mode. If start_scale = 1
(default), the CNN input patch size is the net resolution (set with --net_resolution
). scale_gap
is used to calculate the scale difference between scales. This parameters are related with the flag --num_scales
. For instance, using --start_scale 1 --num_scales 3 --scale_gap 0.1
means using 3 scales: 1, 1-0.1, 1-2*0.1, hence the different patch sizes correspond to the net resolution multiplied by these scales values.target_spatial_height
: Alternative to factor
. It sets the output height. Ignored if factor != 0
.target_spatial_width
: Alternative to factor
. It sets the output width. Ignored if factor != 0
.num
x channels
x height
x width
), it returns a 4-D output of size (num
x num_parts
x max_peaks+1
x 3
). It is independently applied to each dimension of num
. The seconds dimension corresponds to the number of limbs (num_parts
). The third dimension indicates the maximum number of peaks to be analyzed (max_peaks+1
). Finally, the last one corresponds to the x
, y
and score
values (3
). Its parameters are:
max_peaks
: The number of peaks to be considered. The last total_peaks
- max_peaks
peaks are discarded.num_parts
: The number of limbs to detect (e.g. 15 for MPI and 18 for COCO).threshold
: Any input value smaller than this threshold is set to 0.Please cite the paper in your publications if it helps your research:
@article{cao2016realtime,
title={Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields},
author={Zhe Cao and Tomas Simon and Shih-En Wei and Yaser Sheikh},
journal={arXiv preprint arXiv:1611.08050},
year={2016}
}
@inproceedings{wei2016cpm,
author = {Shih-En Wei and Varun Ramakrishna and Takeo Kanade and Yaser Sheikh},
booktitle = {CVPR},
title = {Convolutional pose machines},
year = {2016}
}