TerenceCYJ / S2HAND

Model-based 3D Hand Reconstruction via Self-Supervised Learning, CVPR2021
104 stars 13 forks source link

Did the 'Camera Intrinsic Parameter' predicted from networks? #6

Closed tlok666 closed 3 years ago

tlok666 commented 3 years ago

Hi, I have a question about the 'Camera Intrinsic Parameter' Ks. image In your paper, I think the Camera Intrinsic Parameters were predicted from networks. And I also found clues in line 210 'https://github.com/TerenceCYJ/S2HAND/blob/main/examples/utils/freihandnet.py' image

But when it comes to the loss function. I find you used Camera Intrinsic Parameters 'Ks' from the dataset to project 3D coordinates into 2D. in line 48 'https://github.com/TerenceCYJ/S2HAND/blob/main/examples/train.py' image

I was confused about the projection function. Why would you use the 'Ks' from the dataset? What is the relationship between those two operations?

Best!

TerenceCYJ commented 3 years ago

Hi.

The camera parameters (s, R, T) show the position of the hand mesh in camera coordinates. While the intrinsic parameters include the focal length and the optical center, and the intrinsic is used for projecting 3D coordinates into 2D space.

In our work, we use the intrinsic that is provided along with the input RGB image.

tlok666 commented 3 years ago

e camera parameters (s, R, T) shows the position of the hand mesh in camera coordinates. While the intrinsic parameters include the focal length and the optical center, and the intrinsic is used for projecting 3D coordinates into 2D space.

Thanks for your reply.

To my understanding, your mentioned camera parameters (s, R, T) only include scale and translation shown below: image

Does it matter whether adopting this scale and translation in camera space?

TerenceCYJ commented 3 years ago

When using the camera intrinsic to project 3D joints into 2D and supervise the learning in 2D, I think the scale and translation matter. (Although we evaluate aligned results for FreiHAND and HO3D.) And the rotation is used in rot_pose_beta_to_mesh.

BTW, there is another option to get rid of using the camera intrinsic is that use orthogonal projection in [1], but in that case, you don't get the real 3D position in camera space.

[1] Learning Category-Specific Mesh Reconstruction from Image Collections.

tlok666 commented 3 years ago

It makes sense. Thank you very much!