FORTH-ModelBasedTracker / MocapNET

We present MocapNET, a real-time method that estimates the 3D human pose directly in the popular Bio Vision Hierarchy (BVH) format, given estimations of the 2D body joints originating from monocular color images. Our contributions include: (a) A novel and compact 2D pose NSRM representation. (b) A human body orientation classifier and an ensemble of orientation-tuned neural networks that regress the 3D human pose by also allowing for the decomposition of the body to an upper and lower kinematic hierarchy. This permits the recovery of the human pose even in the case of significant occlusions. (c) An efficient Inverse Kinematics solver that refines the neural-network-based solution providing 3D human pose estimations that are consistent with the limb sizes of a target person (if known). All the above yield a 33% accuracy improvement on the Human 3.6 Million (H3.6M) dataset compared to the baseline method (MocapNET) while maintaining real-time performance
https://www.youtube.com/watch?v=Jgz1MRq-I-k
Other
858 stars 137 forks source link

Questions about converting OpenPose .json files to .bvh #51

Closed Aurosutru closed 3 years ago

Aurosutru commented 3 years ago

When attempting to run convertOpenPoseJSONToCSV using a copy of the output folder from OpenPose that contains 273 000000000xxx_keypoints.json files the following is output:

tm@tm-VirtualBox:~/Downloads/MocapNET$ ./convertOpenPoseJSONToCSV --from output/ -o
File output//colorFrame_0_00001.jpg does not exist, unable to get its dimensions..
Assuming default image dimensions 1920x1080 , you can change this using --size x y
Threshold is set to 0.50
Processing : 
Stopping search after 1000 checks ..
findFirstJSONFileInDirectory: failed to find any JSON files.. :(
Path : output/ 
Format : %s/%s%05u_keypoints.json 
Label : colorFrame_0_ 
Failed to find a JSON file..!
 tm@tm-VirtualBox:~/Downloads/MocapNET$ 

The name of the first json file is Test1_000000000000_keypoints.json

Why can’t this find the json files? In the steps for using this utility listed at the top of its code it mentions jpeg files but not json files. Can it convert OpenPose json files to CSV files without the need for jpeg files?


How is the mocapnet2CSV utility to be used? The comment section at the top of its code says:

After running initialize.sh there is no test.csv file to be found in the MocapNET directory, but there is a TestCSV binary.

The result of running this program is:

> tm@tm-VirtualBox:~/Downloads/MocapNET$ ./MocapNET2CSV --from output --visualize
> CPU :  Intel(R) Core(TM) i7-10750H CPU @ 2.60GHz
> GPU :  VMware SVGA II Adapter
> Visualization enabled
> The path output doesn't look like a CSV file.. 
> tm@tm-VirtualBox:~/Downloads/MocapNET$ Your version is up to date

Is this utility designed to accept the output from the convertOpenPoseJSONToCSV utility and convert it into a BVH file?
Or will it actually work on the output folder of OpenPose JSON files directly, as apparently stated?

AmmarkoV commented 3 years ago

Hello, Check this dataset / 1.5GB

After downloading and extracting it somewhere if you use

 ./Mocapnet2CSV --from path/to/sven.mp4-data/2dJoints_v1.4.csv

You should see the program operation using OpenPose output

Image files should have colorFrame_0_xxxxx.jpg like names OpenPose files should have the colorFrame_0_xxxxx_keypoints.json name pattern

The .json files must be converted to a CSV file before ( see 2dJoints_v1.4.csv ) You can do this using

./convertOpenPoseJSONToCSV  --from path/to/sven.mp4-data/

This script shows the steps performed on a video file..

Aurosutru commented 3 years ago

Using the input formats exactly as you described both programs are now working nicely.

The Readme.md has a picture with the subtitle: ./MocapNET2LiveWebcamDemo --from shuffle.webm --openpose --show 3 --frames 375 showing the dancer with closed fists, but the armature has open fingers. With tweaks OpenPose can output 2D data with fingers quite accurately. Can MocapNET convert those json files to finger movements in a .bvh file, perhaps by using a --hand argument?

I have put together a short workflow document for processing 2D video with OpenPose having highest accuracy and then converting the output to a .bvh file using MocapNET. Would this be of interest for posting here or elsewhere?

AmmarkoV commented 3 years ago

Good to hear! The hands are work in progress, they are already supported in my development snapshot but the work on extending the neural network to accommodate them has not yet been published. The picture is probably from my dev snapshot ( Its hard to enable disable features all the time, so it slipped by ).. The code in this repository lacks the hand NSRM conversion code as well as the neural network (.pb) models to handle them, however I hope that I can soon publish them, so in the future there will also be hand and face controls (along with any quality improvements on the body)

This is a snapshot of the dev version.. screen-2021-02-02-09-56-22

For the openpose documentation you can either fork the repository and populate the doc/OpenPose.md and do a pull request so I bring it in the repository, or if you have a blog or website you can write it there and I can give a link to your guide in the README..!

Aurosutru commented 3 years ago

That's very good news. The finger tracking in this picture, even from a long distance, looks great.

When might hand and face support land in the public version of MocapNET? Weeks, months, days...?

The Video to .bvh Workflow doc is nearly complete and ready for forking and pulling, but there is an issue with the movements of the .bvh not at all matching the input video movements. Using the --size 640 480 flag in both conversion utility command lines seems to help locate the skeleton onscreen, but the skeleton movements in both MocapNET2CSV and the .bvh in Blender don't match the original video. Are other flags needed?

In the onscreen graphical output there are frequent yellow normalization warnings in the lower right corner. Using OpenPose .json input data here is partial MocapNET2CSV terminal info.


... MocapNET: Received 417 elements Upper Body has an orientation classifier.. Orientation : Front(0.18)/Back(0.21)/Left(0.08)/Right(0.54) Upper Body Left Orientation changed from 6.39 to 96.39 Lower Body Left Orientation changed from -2.53 to 87.47 Warning: Detected pose behind camera! .. Fixed using previous frame ! .. EXPLODING GRADIENT @ hip 1/30! EXPLODING GRADIENT @ chest 3/30! EXPLODING GRADIENT @ hip 1/30! EXPLODING GRADIENT @ hip 1/30! EXPLODING GRADIENT @ hip 1/30! EXPLODING GRADIENT @ chest 6/30! EXPLODING GRADIENT @ hip 1/30! IK 1602 μsec|Body|lr=0.010|maxStartLoss=30000.0|Iterations=5|epochs=30 Sample 1840 - 12.6300ms - 79.1766 fps visualizeInput2DSkeletonFromSkeletonSerialized 4 points seem to be incorrectly normalized signaled units are 1.00 x 1.00 visualizeInput2DSkeletonFromSkeletonSerialized 22 points seem to be incorrectly normalized signaled units are 1920 x 1080 MocapNET: Received 417 elements Upper Body has an orientation classifier.. Orientation : Front(0.10)/Back(0.28)/Left(0.56)/Right(0.06) Upper Body Left Orientation changed from 7.10 to 97.10 Lower Body Left Orientation changed from -2.55 to 87.45 Warning: Detected pose behind camera! .. Fixed using previous frame ! .. EXPLODING GRADIENT @ hip 1/30! EXPLODING GRADIENT @ chest 5/30! EXPLODING GRADIENT @ hip 1/30! EXPLODING GRADIENT @ hip 1/30! EXPLODING GRADIENT @ chest 3/30! EXPLODING GRADIENT @ hip 1/30! EXPLODING GRADIENT @ hip 1/30! EXPLODING GRADIENT @ chest 5/30! IK 1627 μsec|Body|lr=0.010|maxStartLoss=30000.0|Iterations=5|epochs=30 Sample 1841 - 12.9690ms - 77.1069 fps visualizeInput2DSkeletonFromSkeletonSerialized 18 points seem to be incorrectly normalized signaled units are 1.00 x 1.00 visualizeInput2DSkeletonFromSkeletonSerialized 22 points seem to be incorrectly normalized signaled units are 1920 x 1080 Finished with 1842/1842 frames Successfully wrote 1842 frames to bvh file out.bvh..

MocapNET v2.1 execution summary :


BVH subsystem version 0.5 CPU : Intel(R) Core(TM) i7-10750H CPU @ 2.60GHz GPU :
Total 11415.97 ms for 1842 samples - Average 6.20 ms - 161.35 fps Code optimizations where on tm@tm-VirtualBox:~/Downloads/MocapNET$ ./MocapNET2CSV --from BRUout/2dJoints_v1.4.csv --size 640 480

Aurosutru commented 3 years ago

Just realized that part of the current non-matching issue might be due to my using the higher accuracy --model_pose BODY_25B flag in OpenPose, though it apparently has 25 bones, like the standard model.

Aurosutru commented 3 years ago

Reverting to the default OpenPose model without --hand processing produces somewhat better but still erratic onscreen and .bvh body skeleton movements. The blue onscreen error messages in the lower right corner now read: Invalid normalized input points. This video has only shoulders, arms and hips plus hand details not used in this run. Will try a full body next.

Here is partial terminal output:


... MocapNET: Received 417 elements Upper Body has an orientation classifier.. Orientation : Front(0.89)/Back(0.00)/Left(0.02)/Right(0.09) Upper Body Front Nothing changed on lower body, returning previous result.. EXPLODING GRADIENT @ hip 1/30! EXPLODING GRADIENT @ chest 4/30! EXPLODING GRADIENT @ hip 1/30! EXPLODING GRADIENT @ chest 4/30! EXPLODING GRADIENT @ hip 1/30! EXPLODING GRADIENT @ chest 4/30! EXPLODING GRADIENT @ hip 1/30! EXPLODING GRADIENT @ chest 4/30! EXPLODING GRADIENT @ hip 1/30! EXPLODING GRADIENT @ chest 4/30! IK 1126 μsec|Body|lr=0.010|maxStartLoss=30000.0|Iterations=5|epochs=30 Sample 1841 - 10.3690ms - 96.4413 fps visualizeInput2DSkeletonFromSkeletonSerialized 6 points seem to be incorrectly normalized signaled units are 1.00 x 1.00 visualizeInput2DSkeletonFromSkeletonSerialized 138 points seem to be incorrectly normalized signaled units are 1920 x 1080 Finished with 1842/1842 frames Successfully wrote 1842 frames to bvh file out.bvh..

MocapNET v2.1 execution summary :


BVH subsystem version 0.5 CPU : Intel(R) Core(TM) i7-10750H CPU @ 2.60GHz GPU :
Total 9609.76 ms for 1842 samples - Average 5.22 ms - 191.68 fps Code optimizations where on tm@tm-VirtualBox:~/Downloads/MocapNET$ ./MocapNET2CSV --from BRUout4/2dJoints_v1.4.csv --size 640 480

Aurosutru commented 3 years ago

Am having good success with .bvh outputs from full body dance videos. Talking heads with arms and no visible legs are working too. Seems having enough bones, including the head, visible in the video is important for smooth and reliable .bvh movements.

When some videos are processed the resulting .bvh starts at one position in Blender then wanders to another location in the first 20 frames where it settles down. Probably this motion is fairly easy to correct in Blender, but are there are some MocapNET settings in the conversion process that will eliminate it? Maybe the video resolution setting? Shuffle.webm doesn’t perform this wandering.

I have submitted a workflow doc for creating .bvh files using OpenPose and MocapNET utilities. Hope it is helpful.

Looking forward to more documentation and to finger control. Great work here.

Aurosutru commented 3 years ago

Just found that most of the questions raised above have been answered in the comment section at the beginning of convertOpenPoseJSONToCSV.cpp

AmmarkoV commented 3 years ago

:)