NVlabs / nvdiffrast

Nvdiffrast - Modular Primitives for High-Performance Differentiable Rendering
Other
1.28k stars 139 forks source link

Camera space convention #141

Closed ivashmak closed 7 months ago

ivashmak commented 8 months ago

Hi, I'm new to the OpenGL camera convention, and I am currently trying to grasp some aspects of the projection process. Here is a brief overview of my setup:

I have mesh points in the world space, for example, X (of size 3 x N, where N is a number of points). I have extrinsic camera parameters [R | t], and I have intrinsic matrix K. For instance, I have obtained, R, t, K through OpenCV calibration. If I want to get points in the camera coordinate system, I do X_camera = R X + t, and my 2D projection is X_projection ~ K X_camera (up-to-scale).

As far as I understand, in this project, you make use of normalized device coordinates (NDC) and clip space. My question is whether there is a way for me to utilize my camera matrices in a way that allows me to render my mesh points / faces onto an image?

s-laine commented 7 months ago

You can compute the 3D transformations as Rx + t if you have a 3×3 rotation matrix and a 3-component translation vector available. Converting from Cartesian 3D to homogeneous 4D coordinates can be done by just appending 1.0 as the w coordinate. You can either do that after rotation/translation in Cartesian 3D coordinates, or you can do it beforehand, which lets you combine rotation and translation into a single 4×4 transformation matrix (see, e.g., Section 4.2 here). Either way produces the same results.

However, the perspective transformation cannot be done in Cartesian coordinates, as it is a linear transformation in homogeneous space, and thus must be represented using a 4×4 transformation matrix acting on homogeneous coordinates. The output of perspective transformation is a set of homogeneous clip-space coordinates in 4D, and these are what nvdiffrast expects as inputs.

I don't know if you can get the full 4×4 perspective transformation matrix from OpenCV's camera calibration, so you may need to do some additional computation to determine this matrix. Basically the only degrees of freedom for perspective projection are the camera FOV in x and y direction that OpenCV does seem to give you. The link above has formulas for determining the appropriate matrix, under heading "The Perspective Projection Matrix", although the matrix seems to be transposed w.r.t. the OpenGL convention.