FORTH-ModelBasedTracker / MocapNET

We present MocapNET, a real-time method that estimates the 3D human pose directly in the popular Bio Vision Hierarchy (BVH) format, given estimations of the 2D body joints originating from monocular color images. Our contributions include: (a) A novel and compact 2D pose NSRM representation. (b) A human body orientation classifier and an ensemble of orientation-tuned neural networks that regress the 3D human pose by also allowing for the decomposition of the body to an upper and lower kinematic hierarchy. This permits the recovery of the human pose even in the case of significant occlusions. (c) An efficient Inverse Kinematics solver that refines the neural-network-based solution providing 3D human pose estimations that are consistent with the limb sizes of a target person (if known). All the above yield a 33% accuracy improvement on the Human 3.6 Million (H3.6M) dataset compared to the baseline method (MocapNET) while maintaining real-time performance
https://www.youtube.com/watch?v=Jgz1MRq-I-k
Other
858 stars 137 forks source link

Extending towards Hrnet as 2D joint detector #4

Closed timtensor closed 4 years ago

timtensor commented 5 years ago

Hi , first of all great work. I was wondering if it could be extended to HrNET as it is supposed to highly accurate ? Here is an implementation of it . I think it is possible , to dump the json file per frame format in for the keypoints. It is based on coco keypoints . Link to the repo simpleHRNET

There is a demo script here demo_script

The keypoints are outputted here keypoints The keypoint is array type of Nx17x3 where N is number of persons. Please let me know what you think about it ?

AmmarkoV commented 5 years ago

Hello! Thank you for your kind words!

Any source of 2D joints can be used "out of the box" as long as it has the following Joints : HIP,NECK,HEAD,RSHOULDER,RELBOW,RHAND,LSHOULDER,LELBOW,LHAND,RHIP,RKNEE,RFOOT,LHIP,LKNEE,LFOOT since they are the joints used to generate the NSDM matrices internally used by the neural network, as seen in the following illustrations.

image

image

Right now the use case was a real-time demo of a single person, but due to the fast-enough evaluation speed on the 2D to 3D part multiple persons could be handled using iterative runs for every skeleton detected ( framerate should be ok for 1-3 persons but gradually slower ).

The easiest way to do a conversion from an arbitrary 2D joint estimator I think is by using the CSV file format -> https://github.com/FORTH-ModelBasedTracker/MocapNET/blob/master/dataset/sample.csv

If the output is dumped in a csv file using this format then output can be very quickly tested through MocapNET using :

./MocapNETJSON --from YourDataset.csv --visualize

The CSV file format is very easy to write and parse (especially from python), the only caveat and possible pitfall is that the csv file has normalized coordinates that are expected to have a 1.777* aspect ratio since the original cameras I am targeting are GoPro cameras configured for 1920x1080@120fps+. If you have a different video input resolution the normalization step will have to respect this aspect ratio. Of course the code that I use to preserve the aspect ratio regardless of input is included in the repository and can be used for reference https://github.com/FORTH-ModelBasedTracker/MocapNET/blob/master/MocapNETLib/jsonMocapNETHelpers.cpp#L498 and using the normalizeWhileAlsoMatchingTrainingAspectRatio call https://github.com/FORTH-ModelBasedTracker/MocapNET/blob/master/MocapNETLib/jsonMocapNETHelpers.cpp#L174

That being said, I will clone HRNET and try it out :) 2D accuracy is very important in two-stage 3D pose estimation! Multiple person 3D tracking would also be very cool!

timtensor commented 5 years ago

Thank you for the reply. I am trying out different permutations as well. Not really a pro , learning by trying. I must mention , with the yolov3 implementation the accruacy has greatly improved. One thing to note is , that is computationally quite heavy the 2D detection part of it .

AmmarkoV commented 5 years ago

Last version of yolo I had checked out was yolov2, but only for detection of objects and not persons. In any case testing with hrnet would be initially more like an offline experiment especially since hrnet is python/pytorch while this repo is C++/tensorflow

timtensor commented 5 years ago

Yes i totally agree , as a start it should be done in locally saved videos .If i understand correctly, i might be wrong , you need the keypoints per frame as an input to the MocapNET module right .

AmmarkoV commented 5 years ago

Yes, you need at least the hip, neck, head, rshoulder, relbow, rhand, lshoulder, lelbow, lhand, rhip, rknee, rfoot, lhip, lknee, lfoot joint 2D positions organized as 2DXhip,2DYhip,Vhip , ... where V is a visibility flag that is 1 when the joint is visible and 0 when joint is invisible.

The sample CSV file shows the full joint list received from OpenPose Body+Hands 2D Output

The full list of input has 171 elements ( 57 triplets of X2D,Y2D,VisibilityFlag )

by populating an std::vector with 171 values with the correct order and running the runMocapNET call you get back another vector with the full body BVH configuration that needs no inverse kinematics and cant be directly used to animate a model.

This can be also visualized from the main application of course http://ammar.gr/mocapnet/mocapnetogl.ogv

timtensor commented 4 years ago

Thanks a lot for the infromation. I still couldnt manage to extend it to the simple hrnet as 2D detector. I am doing everything off line at this moment

AmmarkoV commented 4 years ago

Hello, if you have a sample small CSV file you generated ( like this ) I can take a look at it to maybe help you resolve the problem..

AmmarkoV commented 4 years ago

I have given a CSV file example, that can be used to package any 2D estimator output and enable its processing by MocapNET, adding native support for multiple 2D estimators is beyond the scope of this repository so I am closing to this issue! :) In case of questions on how to package 2D input feel for MocapNET free to open a new issue!