anatolix / keras_Realtime_Multi-Person_Pose_Estimation

Keras version of Realtime Multi-Person Pose Estimation project
Other
118 stars 47 forks source link

About this fork

This fork contains pure python version of Realtime Multi-Person Pose Estimation. Initially it was forked from Michal Faber fork, all credit for porting original work to Keras goes to him.

I this fork I've reimplemented images argumentation in pure python, it is significanly shorter(285 lines vs 1202 lines in Michal Faber's C++ rmpe_server, and way less than in original work)

Despite of Python language this code is significantly faster than original implementation(140 images/s vs 30 images/s C++ code on my machine). This is not really useful since most of people don't have 5 GPUs, but just to prove the point python programs could be fast. The magic is in combining all affine transformations to one matrix, and calling single warpAffine, and vectorized numpy computation of PAFs and Heatmaps.

Could be run as iterator inside train_pose.py (default), or as separate ./rmpe_server.py

Current status

Current work

Realtime Multi-Person Pose Estimation

This is a keras version of project

Introduction

Code repo for reproducing 2017 CVPR paper using keras.

Results

 

Contents

  1. Converting caffe model
  2. Testing
  3. Training

Require

  1. Keras
  2. Caffe - docker required if you would like to convert caffe model to keras model. You don't have to compile/install caffe on your local machine.

Converting Caffe model to Keras model

Authors of original implementation released already trained caffe model which you can use to extract weights data.

Testing steps

Training steps

UPDATE 26/10/2017

Fixed problem with the training procedure. Here are my results after training for 5 epochs = 25000 iterations (1 epoch is ~5000 batches) The loss values are quite similar as in the original training - output.txt

Results of running demo_image --image sample_images/ski.jpg --model training/weights.best.h5 with model trained only 25000 iterations. Not too bad !!! Training on my single 1070 GPU took around 10 hours.

UPDATE 22/10/2017:

Augmented samples are fetched from the server. The network never sees the same image twice which was a problem in previous approach (tool rmpe_dataset_transformer) This allows you to run augmentation locally or on separate node. You can start 2 instances, one serving training set and a second one serving validation set (on different port if locally)

Related repository

Citation

Please cite the paper in your publications if it helps your research:

@InProceedings{cao2017realtime,
  title = {Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields},
  author = {Zhe Cao and Tomas Simon and Shih-En Wei and Yaser Sheikh},
  booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year = {2017}
  }