facebookresearch / InterHand2.6M

Official PyTorch implementation of "InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image", ECCV 2020
Other
675 stars 92 forks source link

The scale of the depth #154

Closed ZYX-MLer closed 3 months ago

ZYX-MLer commented 3 months ago

I've been studying the data processing part of algorithms recently, and I've noticed that in the getitem function, when the input image bounding box size is not consistent with cfg.input_img_shape, the image is resized along the x and y dimensions by the code at the bottom of the augmentation function .

for i in range(joint_num):
        joint_coord[i,:2] = trans_point2d(joint_coord[i,:2], trans)
        joint_valid[i] = joint_valid[i] * (joint_coord[i,0] >= 0) * (joint_coord[i,0] < cfg.input_img_shape[1]) * (joint_coord[i,1] >= 0) * (joint_coord[i,1] < cfg.input_img_shape[0])

However, no corresponding processing is applied to the depth information. Therefore, during the actual training process, the x and y values predicted by the depth model are relative to the trainning image, while the depth information is in the camera's coordinate space. Is this understanding correct? Should the depth information also be scaled accordingly?

mks0601 commented 3 months ago

You can see this function https://github.com/facebookresearch/InterHand2.6M/blob/655ba3c29394a1d3fb7a96fd0e8d7e57fa948306/common/utils/preprocessing.py#L96