facebookresearch / VideoPose3D

Efficient 3D human pose estimation in video using 2D keypoint trajectories
Other
3.72k stars 753 forks source link

Ground Truth and Limb Lengths? #58

Open jimwgoldmine opened 5 years ago

jimwgoldmine commented 5 years ago

Hello, Congrats on VideoPose - a magnificent piece of work.

I am assessing the accuracy of the already processed video. I figured a good way to do this would be to see how much the limb lengths vary from frame to frame.

As per https://github.com/facebookresearch/VideoPose3D, I ran: python run.py -k cpn_ft_h36m_dbb -arc 3,3,3,3,3 -c checkpoint --evaluate pretrained_h36m_cpn.bin But I added some code to def update_video(i): in visualization.py to print out the 3Dcoords of the skeleton points every frame.

Then for each frame, I calculated the limb lengths, e.g dist from ankle to knee, elbow to wrist, etc.

For Ground Truth, the limb lengths are basically identical for each frame - great! For Reconstruction, there is a lot of variation.

This leads me to two questions- (Pardon my ignorance , I have looked these up, but without much success) What exactly is Ground truth - and why does it improve accuracy so much?

What determine the length of the limbs / size of the skeleton?

Oh, one more question: The results from videos show the person moving through space /over the ground, where as others show them moving on the spot. e.g on the page https://github.com/facebookresearch/VideoPose3D , the results show walker is moving across the ground, but the skater is moving on the spot. What determines this difference?

Thank you

dariopavllo commented 5 years ago

Hi,

The ground truth is simply the ground truth, as the name says (the labels from the test set), so it's normal that it's perfect. For the reconstruction there might be some variation, but it should not be that large (you should look at the standard deviation and compare it with the ground truth, e.g. by looking at the ratio).

The bone lengths are different for each subject (but constant across all videos of the same subject) and are defined in Human3.6M. These have been physically measured using motion capture. Our model does not explicitly output bone lengths, but only 3D joint positions. The constraints are learned approximately. At some point we tried to enforce constant bone lengths across a single video, but that didn't work so well.

Regarding the videos, it depends on whether we have the trajectory or not. Some videos show the test set of Human3.6M, where we have the trajectory and can add it to the final visualization. Videos in the wild don't have this information. In principle it could be estimated, but its accuracy wouldn't be very high, especially if the subject is far from the camera.

jimwgoldmine commented 5 years ago

Thanks

So, If I understand this correctly / to summarise:

The video I looked at: https://github.com/facebookresearch/VideoPose3D/blob/master/images/demo_h36m.gif (or rather the npz file generated from it) has a ground truth, because it is one of the reference /test set videos. An 'in the wild' video won't have a ground truth.

How is the ground truth generated ? From MoCap?

Also, as the above video is from a test set, it has trajectory information, so the subject can be shown moving over the ground. An 'in the wild' video won't have trajectory information, so so the subject can't easily be shown moving over the ground.

How is the trajectory generated? From MoCap?

Are the above correct?

Thank you

slava-smirnov commented 4 years ago

@jimwgoldmine sorry for the necro, but did you have any success on implementing 3d trajectory for the wild?