facebookresearch / pytorch3d

PyTorch3D is FAIR's library of reusable components for deep learning with 3D data
https://pytorch3d.org/
Other
8.81k stars 1.32k forks source link

Transforming Camera Position from World to View Coordinate System #1556

Open OumGu opened 1 year ago

OumGu commented 1 year ago

Hello,

I would like to thank you for providing this tutorial example https://pytorch3d.org/tutorials/camera_position_optimization_with_differentiable_rendering . I have been working on modifying the code to optimize the camera position by providing 6 initial parameters: roll, pitch, yaw (the three Euler angles), and x, y, z (the translation vector).

In my implementation, I separated the parameters into two vectors:

self.camera_translation = nn.Parameter(
     torch.from_numpy(np.array([0.0, 0.0, 30.0], dtype=np.float32)).to(meshes.device))  

 self.camera_rotation = nn.Parameter(
    torch.from_numpy(np.array([np.deg2rad(0.0), np.deg2rad(180.0), np.deg2rad(0.0)], dtype=np.float32)).to(
        meshes.device))

self.camera_translation and self.camera_rotation are representing the camera's translation and rotation, respectively.

To compute the rotation matrix R and translation vector T in the forward function, I used the euler_angles_to_matrix function to convert the camera rotation parameters to a rotation matrix R, and the translation vector T was set directly using self.camera_translation:

R = euler_angles_to_matrix(self.camera_rotation, "XYZ").reshape((1, 3, 3))
T = self.camera_translation.reshape((1, 3))

However, I encountered a problem where providing the exact same initial camera parameters did not result in the same camera view. I found that to obtain the desired view, I had to apply an additional rotation of 180 degrees around the y-axis.

Results without additional rotation of 180 degrees around the y-axis: issue

I suspect that the issue arises from a confusion regarding the coordinate system. It seems that R and T might not be representing the camera position in the view coordinate system, and therefore, they need to be transformed properly to achieve the desired view.

My question is: How can I transform R and T to properly represent the camera position in the view coordinate system, ensuring that I get the same view without requiring an additional rotation of 180 degrees around the y-axis?

Thank you in advance.