facebookresearch / frankmocap

A Strong and Easy-to-use Single View 3D Hand+Body Pose Estimator
Other
2.15k stars 374 forks source link

Parameter meaning in od_renderer.py #104

Closed dldaisy closed 3 years ago

dldaisy commented 3 years ago

Hi! I'am recently trying to adapt your renderer code to pyrender, but met some problems in placing camera. In my understanding, you place the smpl mesh to the original image space by calling convert_smpl_to_bbox and convert_bbox_to_oriIm instead of translate and scale the camera as previous works like SPIN. However, you seemed to transform the vertices again in od_renderer.py as follows:

        input_size = 500
        f = 10
        print('max verts: ', max_verts)
        print('min verts: ', min_verts)
        verts[:, 0] = (verts[:, 0] - input_size) / input_size
        verts[:, 1] = (verts[:, 1] - input_size) / input_size

        verts[:, 2] /= (5 * 112)
        verts[:, 2] += f

        cam_for_render = np.array([f, 1, 1]) * input_size

I was wondering what the input_size and f means? Why the verts and cam_for_render should be transformed in this way(for example, /=5*112 + f) Can you give me some explaination for that? Thank you very much!

penincillin commented 3 years ago

@dldaisy We adopt weak perspective projection (orthogonal projection) in rendering, which is not naturally supported by opendr. Therefore, we simulate the weak perspective projection by decreasing the z value of mesh vertices. If you would like to do some math by calculating the projected value of x, y using cam_for_render as camera intrinsic (you can neglect the camera extrinsic), you will find that the projected x, y is nearly the same as their original value.

dldaisy commented 3 years ago

@penincillin Thank you for your kind reply, but I'm still confused about a question. As I remember, cam_for_render means [scale, trans_x, trans_y], so why should the camera be translated by 1*input_size?

penincillin commented 3 years ago

@penincillin Thank you for your kind reply, but I'm still confused about a question. As I remember, cam_for_render means [scale, trans_x, trans_y], so why should the camera be translated by 1*input_size?

[scale, trans_x, trans_y] is the predicted camera parameters from CNN. These params are used to transform the origin SMPL/MANO/SMPL-X vertices. The transformation includes translation and scale.

dldaisy commented 3 years ago

Thanks a lot!