I am using the PyTorch3D renderer to get camera poses, and I need to convert these poses so they can be directly used with OpenCV. From what I understand, the coordinate systems and projections in PyTorch3D and OpenCV differ:
PyTorch3D Camera Convention:
+X points left
+Y points up
+Z points outward from the camera (right-handed system).
OpenCV Camera Convention:
+X points right
+Y points down
+Z points outward from the camera (right-handed system).
I believe I need to flip the X and Y axes for the rotation matrix (R) and translation vector (T) when converting between these formats. I’ve implemented this as:
PyTorch3D to OpenCV Conversion:
R_opencv = R_pytorch3d.clone()
R_opencv[:, :2] *= -1 # Flip X and Y axes
T_opencv = T_pytorch3d.clone()
T_opencv[:2] *= -1 # Flip X and Y translation
However, I am confused about the negative focal length aspect in PyTorch3D's screen-space projection. Why is a negative focal length necessary in PyTorch3D to get a proper screen-space projection, and how does this impact my conversion to OpenCV? Do I need to adjust the projection parameters or intrinsic matrix when using OpenCV?
Any guidance on how to properly handle this projection difference and convert the pose to be directly usable in OpenCV would be greatly appreciated!
I am using the PyTorch3D renderer to get camera poses, and I need to convert these poses so they can be directly used with OpenCV. From what I understand, the coordinate systems and projections in PyTorch3D and OpenCV differ:
PyTorch3D Camera Convention:
OpenCV Camera Convention:
I believe I need to flip the X and Y axes for the rotation matrix (R) and translation vector (T) when converting between these formats. I’ve implemented this as:
PyTorch3D to OpenCV Conversion:
OpenCV to PyTorch3D Conversion:
However, I am confused about the negative focal length aspect in PyTorch3D's screen-space projection. Why is a negative focal length necessary in PyTorch3D to get a proper screen-space projection, and how does this impact my conversion to OpenCV? Do I need to adjust the projection parameters or intrinsic matrix when using OpenCV?
Any guidance on how to properly handle this projection difference and convert the pose to be directly usable in OpenCV would be greatly appreciated!