FORTH-ModelBasedTracker / MocapNET

We present MocapNET, a real-time method that estimates the 3D human pose directly in the popular Bio Vision Hierarchy (BVH) format, given estimations of the 2D body joints originating from monocular color images. Our contributions include: (a) A novel and compact 2D pose NSRM representation. (b) A human body orientation classifier and an ensemble of orientation-tuned neural networks that regress the 3D human pose by also allowing for the decomposition of the body to an upper and lower kinematic hierarchy. This permits the recovery of the human pose even in the case of significant occlusions. (c) An efficient Inverse Kinematics solver that refines the neural-network-based solution providing 3D human pose estimations that are consistent with the limb sizes of a target person (if known). All the above yield a 33% accuracy improvement on the Human 3.6 Million (H3.6M) dataset compared to the baseline method (MocapNET) while maintaining real-time performance
https://www.youtube.com/watch?v=Jgz1MRq-I-k
Other
858 stars 137 forks source link

the given bvh file's result seem bad #56

Closed visonpon closed 2 years ago

visonpon commented 3 years ago

Hi, I have download the provided bvh file of your demo, but it seems the reuslts are bad since the pose sometimes not right and the root motion has big problems like drifting and jitter. I wonder if this is the bvh file's problem or the model's ability not so well

AmmarkoV commented 3 years ago

Hello, this is real-time 3D pose estimation from a monocular RGB video in Youtube, with no provided calibration, no body dimensions on the tracked subject, with motion blur and a moving camera ( as seen here -> https://www.youtube.com/watch?v=GtJct8nKjcc#t=29s ) The BVH output is standard so I don't know what could go wrong with it, the ensemble and HCD fine-tuning is doing its best to accommodate the above challenges. This paper describes exactly what is happening under the hood

visonpon commented 3 years ago

Hello, I have built the environment and went through the process to produce bvh file on my own video[with fixed camera position and high resolution], it seems the model has a limited ability on custom video. Nonetheless, this work still inspires me to do some work based on this, thanks~

AmmarkoV commented 3 years ago

Some tips for better I/O A) It is important for the input video resolution to be close to the training 16:9 The "best" input videos are from GoPro cameras using a 1920x1080 resolution

B) If you are using the WebcamLiveDemo it uses a homebrewed 2d estimator that is not very accurate if you use --openpose you can use a "heavier" 2D joint estimator that is slightly better

C) for even better results I personally use a standalone OpenPose installation and dump .json files that I then convert to .csv files A user has contributed this guide https://github.com/FORTH-ModelBasedTracker/MocapNET/blob/master/doc/OpenPose.md that might be helpful for this

in the next version I will add a MediaPipe binding to improve accuracy/performance and phase out my 2D estimator

AmmarkoV commented 3 years ago

The mediapipe wrapper is underway here : https://github.com/FORTH-ModelBasedTracker/MocapNET/tree/master/src/python/mediapipe although not it is not yet functional..