Dene33 / video_to_bvh

Convert human motion from video to .bvh
369 stars 112 forks source link

Questions regarding 3D lifting from other 2D detectors #8

Closed timtensor closed 5 years ago

timtensor commented 5 years ago

Hi Great work ! I was wondering if it was possible to have other 2D detectors as input ? There are two possible detectors that ouput the 2D joints hrnet posenet

For hrnet implementation, there is a python implementation done in the following https://github.com/lxy5513/hrnet And the keypoints are returned in the following https://github.com/lxy5513/hrnet/blob/master/pose_estimation/demo.py#L153 Where

preds.shape = (N, 17, 2), N is the frame number of video , 2 is the coordinate
maxvals = (N, 17, 1), 1 is the confidence of coordinate

I think the output keypoints have to be rearranged a bit to get something similar to that of openpose which is of the format {"people": [{"pose_keypoints_2d": [374, 460, 374, 516, 324, 518, 296, 596, 336, 636, 424, 512, 446, 590, 424, 604, 340, 660, 324, 776, 308, 890, 400, 660, 402, 792, 400, 904, 364, 448, 382, 450, 348, 450, 396, 450]}]}

For posenet there is a python implementation https://github.com/rwightman/posenet-python and the keypoints are returned here

It also has a different keypoint ordering , but can be extended to openpose format.

You also mentioned the 3d joints are exported in a csv format. Is it possible , also to export it to unity , for animation ? Your thoughts and inputs would be appreciated.

Dene33 commented 5 years ago

2d estimation used mainly to predict bounding box of person to crop the video and process only that cropped part in 3d estimation step (you can find out more of that at default hmr repo). You can use any implementation but be sure to have its outputs in openpose format.

You can use .csv file as you want. Just find it in one of the folders in Colab (do not remember which one actually but you should find it easily)

timtensor commented 5 years ago

Thank you , I have just some questions , since i am trying to map the output from other 2d detectors with the same json file format.

exec(open('model_load.py').read()) !pip2 install opendr==0.77`

-  The following command below is related to `ffmpeg` right , where the video sequence is split into , images 

convert to images, specify fps rate

!bash video_to_images.sh 24


 -  Can i read the `json` format from another folder where the output `json files` are located from frame by frame ? 

- The following takes the 2d joint input and outputs the `(x,y,z)` in csv format ?
`bash hmr/3dpose_estimate.sh`

I think it would be really cool, to try with other joint detectors to see how it behaves for each . Thanks a lot for your previous answers 

I am trying to follow the following repo 
[https://github.com/Dene33/hmr](url)
Dene33 commented 5 years ago

You can experiment with that as you like (skip some steps if you really know what you're doing). However, I'm not sure if it makes much sense. As I've mentioned before - based on 2d pose estimator's results, hmr just crops the person from image(s). Any other information from that 2d prediction, like positions of joints in space or something, is not used for 3d reconstruction.

ffmpeg step is to split video to images, right.

You can skip 2d pose estimation step and provide your own 2d estimated joints in openpose format (sample_jsons folder). If you want another folder you must specify which one here: hmr/3dpose_estimate.sh

3dpose_estimate.sh outputs joints positions in .csv, right.

timtensor commented 5 years ago

Yes , as you mentioned . The 2d joint detectors are only aiming to get the bounding box not really helping to estimate 3D joints. I am trying different lifting methods from 2d video input and was curious to know more . Some of them are as follows https://github.com/ArashHosseini/3d-pose-baseline https://github.com/DenisTome/Lifting-from-the-Deep-release But unfortunately , i am not getting any proper results using other pose detectors

In your python code is it possible to input a video directly ? or do i have to preprocess it ?