facebookresearch / VideoPose3D

Efficient 3D human pose estimation in video using 2D keypoint trajectories
Other
3.75k stars 757 forks source link

Model works only with square frames #47

Open edrdos101 opened 5 years ago

edrdos101 commented 5 years ago

Hi,

I realised that cpn-pt-243 checkpoint produces good results when pose is estimated for square images. Whenever a rectangular image is used (i.e. 1280 x 720) the resulting 3D pose is somewhat skewed to one side.

For example here's the original 1280 by 720 frame: rect

And here's the same video/frame but cropped to 720 by 720 just before extracting the 2D pose coordinates: sqr

As you can see 2nd one is a lot more accurate. However, it seems like it shouldn't really matter. Although the training data was square(?) it shouldn't really affect the results as long as I (quoting the author of the repo) normalize 2D pose to the longer edge of the frame and keep aspect ratio which I do for both cases.

Could you please advise if the model expects a square input image indeed or that's just a fluke?

Thanks

dariopavllo commented 5 years ago

The model should work with any aspect ratio, as long as the longest side is normalized to be in (-1, 1). You should also make sure that the shortest side is centered on the origin. E.g. if your resolution is 1280 x 720, (0, 1280) should be mapped to (-1, 1), and (0, 720) should be mapped to (-0.5625, 0.5625). The position/scale of the 2D pose within the camera frame is important, you should not normalize it.

On the top figure, I see two issues: the zero is at the corner of the image (not at the center), and something is wrong with the rescaling (the height ranges from 0 to -1.4?)

edrdos101 commented 5 years ago

Thank you! I noticed the same but assumed it was the shape of the input image as I followed the normalization routine in your code. I still don't fully understand why it would map wrong, unless I'm missing a step.

The normalization you see in top figure comes from doing:

normalize_screen_coordinates(kpt[..., :2], w = frameSize[1], h = frameSize[0]) as in common.camera

Is there a step after that should centre/rescale pose coordinates?

PS normalize_screen_coordinates returns this: X/w*2 - [1, h/w]

dariopavllo commented 5 years ago

That's very strange. The function should already take care of everything. Maybe you are inverting frameSize[1] and frameSize[0]? Or maybe there is something wrong in the range of kpt.