FORTH-ModelBasedTracker / MocapNET

We present MocapNET, a real-time method that estimates the 3D human pose directly in the popular Bio Vision Hierarchy (BVH) format, given estimations of the 2D body joints originating from monocular color images. Our contributions include: (a) A novel and compact 2D pose NSRM representation. (b) A human body orientation classifier and an ensemble of orientation-tuned neural networks that regress the 3D human pose by also allowing for the decomposition of the body to an upper and lower kinematic hierarchy. This permits the recovery of the human pose even in the case of significant occlusions. (c) An efficient Inverse Kinematics solver that refines the neural-network-based solution providing 3D human pose estimations that are consistent with the limb sizes of a target person (if known). All the above yield a 33% accuracy improvement on the Human 3.6 Million (H3.6M) dataset compared to the baseline method (MocapNET) while maintaining real-time performance
https://www.youtube.com/watch?v=Jgz1MRq-I-k
Other
858 stars 137 forks source link

Is it possible for Mocapnet to run on video instead of livecam? #64

Closed CGMikeG closed 3 years ago

CGMikeG commented 3 years ago

Is it possible for Mocapnet to run on video instead of livecam? Or is this something that will be added to MocapNet down the pipepline.

AmmarkoV commented 3 years ago

Yes! As seen in the README.md instead of supplying a path to a video stream you can supply the path to a file in the "--from" parameter on the MocapNET2LiveWebcamDemo utility..! Please always keep in mind that this utility uses a homebrewed 2D joint estimator that is not as accurate as the official implementation of OpenPose and that since aspect ratio is important for best results use a 1920x1080 video file..

I am attaching the relevant parts of the README.md file here..!

Using videos instead of livecam


Testing the library using a pre-recorded video file (i.e. not live input) means you can use a slower but more precise 2D Joint estimation algorithm like the included OpenPose implementation. You should keep in mind that this OpenPose implementation does not use PAFs and so it is still not as precise as the official OpenPose implementation. To run the demo with a prerecorded file issue :

./MocapNET2LiveWebcamDemo --from /path/to/yourfile.mp4 --openpose

We have included a video file that should be automatically downloaded by the initialize.sh script. Issuing the following command should run it and produce an out.bvh file even if you don't have any webcam or other video files available! :

./MocapNET2LiveWebcamDemo --from shuffle.webm --openpose --frames 375

Higher accuracy using OpenPose JSON files


In order to get higher accuracy output compared to the live demo which is more performance oriented, you can use OpenPose and the 2D output JSON files produced by it. The convertOpenPoseJSONToCSV application can convert them to a BVH file. After downloading OpenPose and building it you can use it to acquire 2D JSON body pose data by running :

build/examples/openpose/openpose.bin -number_people_max 1 --hand --write_json /path/to/outputJSONDirectory/ -video /path/to/yourVideoFile.mp4

This will create files in the following fashion /path/to/outputJSONDirectory/yourVideoFile_XXXXXXXXXXXX_keypoints.json Notice that the filenames generated encode the serial number by padding it up to 12 characters (marked as X). You provide this information to our executable using the --seriallength commandline option.

The dump_and_process_video.sh script has been included that can be used to fully process a video file using openpose and then process it through MocapNET, or act as a guide for this procedure.

A utility has been included that can convert the JSON files to a single CSV file issuing :

 ./convertOpenPoseJSONToCSV --from /path/to/outputJSONDirectory/ --label yourVideoFile --seriallength 12 --size 1920 1080 -o .

For more information on how to use the conversion utility please see the documentation inside the utility

A CSV file has been included that can be run by issuing :

 ./MocapNET2CSV --from dataset/sample.csv --visualize --delay 30

The delay is added in every frame so that there is enough time for the user to see the results, of course the visualization only contains the armature since the CSV file does not have the input images.

Check out this guide contributed by a project user for more info.