Output format and rescaling

facebookresearch / frankmocap

A Strong and Easy-to-use Single View 3D Hand+Body Pose Estimator

Other

2.12k stars 373 forks source link

Output format and rescaling #154

Closed tfederico closed 2 years ago

tfederico commented 2 years ago

Hello,

in the documentation you say that when prediction 3d coordinates, the X and Y are "aligned to input image". Could you please explain this in detail? Does it mean that the model outputs something in the range [-1, 1] or [0, 1] and you rescale it based on the image size?

If I wanted to rescale them in the range [-1, 1], should I just divide them by the image width and height respectively? Or do you perform a square cropping (e.g., 256x256) and one should divide the output accordingly?

Also, does [0, 0] correspond to the top left corner of the image?

tfederico commented 2 years ago

Also how do you rescale the Z component?

penincillin commented 2 years ago

The original predictions from SMPL/MANO/SMPL-X use meter as the unit, let say it is p0. We first use estimated weak perspective camera parameters to scale these predictions to input image space, obtaining p1. Since the input images are cropped from the original images, we further rescale the p1 from cropped image space to original image space, obtaining p2. The unit of p0, p1, and p2 are meter, pixel, and pixel, respectively. For z component, it uses the same scale factor as x and y, so that the scaled mesh is still a valid mesh.

tfederico commented 2 years ago

How is the cropping perfomed? Do you crop a N by N pixels square around the center of the bounding box?

penincillin commented 2 years ago

@tfederico Cropping is performed through bounding bbox defined as left corner pixels, width, and height.

tfederico commented 2 years ago

To convert the output to SMPL I basically used these functions and revert the order of the operations:

https://github.com/facebookresearch/frankmocap/blob/46de6e745e2c69d130bf5b5f1e1fb1ccc333cfa7/mocap_utils/coordconv.py#L18-L30

https://github.com/facebookresearch/frankmocap/blob/46de6e745e2c69d130bf5b5f1e1fb1ccc333cfa7/mocap_utils/coordconv.py#L33-L47

All the parameters (camera, rescale ration, etc) can be found in the pkl output.