Open edrdos101 opened 5 years ago
The model should work with any aspect ratio, as long as the longest side is normalized to be in (-1, 1). You should also make sure that the shortest side is centered on the origin. E.g. if your resolution is 1280 x 720, (0, 1280) should be mapped to (-1, 1), and (0, 720) should be mapped to (-0.5625, 0.5625). The position/scale of the 2D pose within the camera frame is important, you should not normalize it.
On the top figure, I see two issues: the zero is at the corner of the image (not at the center), and something is wrong with the rescaling (the height ranges from 0 to -1.4?)
Thank you! I noticed the same but assumed it was the shape of the input image as I followed the normalization routine in your code. I still don't fully understand why it would map wrong, unless I'm missing a step.
The normalization you see in top figure comes from doing:
normalize_screen_coordinates(kpt[..., :2], w = frameSize[1], h = frameSize[0])
as in common.camera
Is there a step after that should centre/rescale pose coordinates?
PS normalize_screen_coordinates returns this:
X/w*2 - [1, h/w]
That's very strange. The function should already take care of everything. Maybe you are inverting frameSize[1]
and frameSize[0]
? Or maybe there is something wrong in the range of kpt
.
Hi,
I realised that cpn-pt-243 checkpoint produces good results when pose is estimated for square images. Whenever a rectangular image is used (i.e. 1280 x 720) the resulting 3D pose is somewhat skewed to one side.
For example here's the original 1280 by 720 frame:
And here's the same video/frame but cropped to 720 by 720 just before extracting the 2D pose coordinates:
As you can see 2nd one is a lot more accurate. However, it seems like it shouldn't really matter. Although the training data was square(?) it shouldn't really affect the results as long as I (quoting the author of the repo) normalize 2D pose to the longer edge of the frame and keep aspect ratio which I do for both cases.
Could you please advise if the model expects a square input image indeed or that's just a fluke?
Thanks