Closed mikeroberts3000 closed 2 years ago
Here is my code for letting FoVPerspectiveCameras
compute its own K
matrix:
# Initialize a camera.
# look at the tip of the right brown bump on the cow's head
R, T = look_at_view_transform(eye=[[0.0, 5.0, 5.0]], up=[[0.0, 1.0, 0.0]], at=[[0.182289, 0.937539, -0.304334]])
print(R)
print(T)
# set each dimension to be half the size of a Hypersim image
width_pixels = 512
height_pixels = 384
fov_y = np.pi / 3.0
# aspect_ratio here is the aspect ratio of an individual pixel so we set to 1.0 regardless of width and height
cameras = FoVPerspectiveCameras(device=device, R=R, T=T, fov=fov_y, degrees=False, aspect_ratio=1.0)
raster_settings = RasterizationSettings(
image_size=[height_pixels, width_pixels],
blur_radius=0.0,
faces_per_pixel=1,
)
# Place a point light in front of the object. As mentioned above, the front of the cow is facing the
# -z direction.
lights = PointLights(device=device, location=[[0.0, 0.0, -3.0]])
# Create a Phong renderer by composing a rasterizer and a shader. The textured Phong shader will
# interpolate the texture uv coordinates for each vertex, sample from a texture image and
# apply the Phong lighting model
renderer = MeshRenderer(
rasterizer=MeshRasterizer(
cameras=cameras,
raster_settings=raster_settings
),
shader=SoftPhongShader(
device=device,
cameras=cameras,
lights=lights
)
)
images = renderer(mesh)
plt.imsave("ref_384_512.png", images[0, ..., :3].cpu().numpy())
plt.figure(figsize=(10, 10))
plt.imshow(images[0, ..., :3].cpu().numpy())
plt.axis("off");
Here is my code for supplying my own K
matrix:
# Initialize a camera.
# look at the tip of the right brown bump on the cow's head
R, T = look_at_view_transform(eye=[[0.0, 5.0, 5.0]], up=[[0.0, 1.0, 0.0]], at=[[0.182289, 0.937539, -0.304334]])
print(R)
print(T)
# set each dimension to be half the size of a Hypersim image
width_pixels = 512
height_pixels = 384
fov_y = np.pi / 3.0
near = 1.0
far = 100.0
# construct projection matrix
f_h = np.tan(fov_y/2.0)*near
f_w = f_h*width_pixels/height_pixels
left = -f_w
right = f_w
bottom = -f_h
top = f_h
M_proj = np.matrix(np.zeros((4,4)))
M_proj[0,0] = (2.0*near)/(right - left)
M_proj[1,1] = (2.0*near)/(top - bottom)
M_proj[0,2] = (right + left)/(right - left)
M_proj[1,2] = (top + bottom)/(top - bottom)
M_proj[2,2] = -(far + near)/(far - near)
M_proj[3,2] = -1.0
M_proj[2,3] = -(2.0*far*near)/(far - near)
M_pytorch_from_opengl = np.matrix(np.identity(4))
M_pytorch_from_opengl[3,3] = -1
K = (M_pytorch_from_opengl*M_proj).A
print(M_pytorch_from_opengl)
print(M_proj)
print(K)
# aspect_ratio here is the aspect ratio of an individual pixel so we set to 1.0 regardless of width and height
cameras = FoVPerspectiveCameras(device=device, R=R, T=T, fov=fov_y, degrees=False, aspect_ratio=1.0, K=[K])
# also doesn't work
# cameras = FoVPerspectiveCameras(device=device, R=R, T=T, K=[K])
raster_settings = RasterizationSettings(
image_size=[height_pixels, width_pixels],
blur_radius=0.0,
faces_per_pixel=1,
)
# Place a point light in front of the object. As mentioned above, the front of the cow is facing the
# -z direction.
lights = PointLights(device=device, location=[[0.0, 0.0, -3.0]])
# Create a Phong renderer by composing a rasterizer and a shader. The textured Phong shader will
# interpolate the texture uv coordinates for each vertex, sample from a texture image and
# apply the Phong lighting model
renderer = MeshRenderer(
rasterizer=MeshRasterizer(
cameras=cameras,
raster_settings=raster_settings
),
shader=SoftPhongShader(
device=device,
cameras=cameras,
lights=lights
)
)
images = renderer(mesh)
plt.imsave("user_K_384_512.png", images[0, ..., :3].cpu().numpy())
plt.figure(figsize=(10, 10))
plt.imshow(images[0, ..., :3].cpu().numpy())
plt.axis("off");
Notice that I'm accounting for the image aspect ratio exactly as described in the documentation for glFrustum
.
Update: I was able to get the correct rendering output for non-square images. But I needed to set the image width and height to be equal when computing my own K
matrix, and then set them to their true values when instantiating the RasterizationSettings
object.
This is an important difference between the K
matrix in FoVPerspectiveCameras
and the standard OpenGL projection matrix. Is this difference intended?
Here is the working code:
# Initialize a camera.
# look at the tip of the right brown bump on the cow's head
R, T = look_at_view_transform(eye=[[0.0, 5.0, 5.0]], up=[[0.0, 1.0, 0.0]], at=[[0.182289, 0.937539, -0.304334]])
print(R)
print(T)
# set each dimension to be half the height of a Hypersim image, even though this is not the intended image size
width_pixels = 384
height_pixels = 384
fov_y = np.pi / 3.0
near = 1.0
far = 100.0
# construct projection matrix
f_h = np.tan(fov_y/2.0)*near
f_w = f_h*width_pixels/height_pixels
left = -f_w
right = f_w
bottom = -f_h
top = f_h
M_proj = np.matrix(np.zeros((4,4)))
M_proj[0,0] = (2.0*near)/(right - left)
M_proj[1,1] = (2.0*near)/(top - bottom)
M_proj[0,2] = (right + left)/(right - left)
M_proj[1,2] = (top + bottom)/(top - bottom)
M_proj[2,2] = -(far + near)/(far - near)
M_proj[3,2] = -1.0
M_proj[2,3] = -(2.0*far*near)/(far - near)
M_pytorch_from_opengl = np.matrix(np.identity(4))
M_pytorch_from_opengl[3,3] = -1
K = (M_pytorch_from_opengl*M_proj).A
print(M_pytorch_from_opengl)
print(M_proj)
print(K)
cameras = FoVPerspectiveCameras(device=device, R=R, T=T, K=[K])
# only set the actual width and height after K has been computed
width_pixels = 512
height_pixels = 384
raster_settings = RasterizationSettings(
image_size=[height_pixels, width_pixels],
blur_radius=0.0,
faces_per_pixel=1,
)
@mikeroberts3000 Thank you so much for your detailed explanation and code snippet! This is a fantastic example how to properly report issues!
I think I have an understanding of the issue and the confusion. We made some intended design choices for non-square rendering in PyTorch3D which might cause confusion in the context of OpenGL. Let me briefly explain.
Let's assume we talk about NDC coordinates throughout this post.
square images
, our renderer will render coordinates in [-1, 1]x[-1, 1]
. Anything outside that range is ignored. non-square images
, our renderer renders coordinates in [-u,u]x[-1,1]
or [-1, 1]x[-u, u]
, where u>1
is the ratio of the longest divided by the shortest side. In other words, the shortest side extends [-1, 1]
and the longest side [-u, u]
. This is different than OpenGL and indeed affects how to define cameras for the non-square image case!To recap! For square images, K
should project the part of the scene you wish to render to [-1, 1]x[-1,1]
. This is compatible with most camera conventions out there, including OpenGL. This is why you see no problem in your code!
For non-square images, according to our PyTorch3D conventions, K
is expected to project the part of the scene to be rendered to [-u,u]x[-1,1]
or [-1, 1]x[-u, u]
. If your K
does not do that and projects the scene to [-1,1]x[-1,1]
then that's an issue!
Let's look at your case! Your OpenGL camera K
is constructed to project to [-1, 1]x[-1,1]
for both square and non-square. But for (image_height, image_width) = (384, 512)
, our renderer will render in [-1.33, 1.33]x[-1,1]
, as we described above. That's the core of the issue and also why you see the cow's but (!) being squashed in the x
-axis. To address this, for non-square images your definition of K should project points to the PyTorch3D space, aka [-1.33, 1.33]x[-1,1]
. This explains why your K
defined as if the image is square with the shortest image side gave you right result, as that K
projects to [-1.33, 1.33]x[-1,1]
.
Non-square rendering is not very common in differentiable renders. We chose to go with a non-square NDC space in this case because we found it the most natural. Though, I do agree that it causes confusions when you are used to OpenGL conventions! Sorry :(
Hi @gkioxari, thanks for that detailed write-up. This makes total sense.
The good news is that all of these NDC issues (sign-flip, non-square images) can be handled cleanly by left-multiplying an appropriate M_pytorch_from_opengl
matrix. In case anyone else needs to do something similar, here is a complete working code snippet for constructing an OpenGL projection matrix, and then constructing an appropriate M_pytorch_from_opengl
matrix to perform the conversion to match the PyTorch3D conventions exactly.
# Initialize a camera.
# look at the tip of the right brown bump on the cow's head
R, T = look_at_view_transform(eye=[[0.0, 5.0, 5.0]], up=[[0.0, 1.0, 0.0]], at=[[0.182289, 0.937539, -0.304334]])
print(R)
print(T)
# set width and height to be half the size of a Hypersim image
width_pixels = 512
height_pixels = 384
fov_y = np.pi / 3.0
near = 1.0
far = 100.0
# construct an OpenGL projection matrix the same as how gluPerspective or glFrustum would construct it
f_h = np.tan(fov_y/2.0)*near
f_w = f_h*width_pixels/height_pixels
left = -f_w
right = f_w
bottom = -f_h
top = f_h
M_proj = np.matrix(np.zeros((4,4)))
M_proj[0,0] = (2.0*near)/(right - left)
M_proj[1,1] = (2.0*near)/(top - bottom)
M_proj[0,2] = (right + left)/(right - left)
M_proj[1,2] = (top + bottom)/(top - bottom)
M_proj[2,2] = -(far + near)/(far - near)
M_proj[3,2] = -1.0
M_proj[2,3] = -(2.0*far*near)/(far - near)
# construct a matrix to convert to PyTorch3D conventions from OpenGL conventions
M_pytorch_from_opengl = np.matrix(np.identity(4))
# flip the sign of the bottom row of M_proj to account for the fact that the PyTorch3D NDC space is a different handedness than OpenGL
M_pytorch_from_opengl[3,3] = -1
# scale the NDC points produced by M_proj to account for PyTorch3D's non-square NDC space
if width_pixels > height_pixels:
M_pytorch_from_opengl[0,0] = width_pixels/height_pixels
if height_pixels > width_pixels:
M_pytorch_from_opengl[1,1] = height_pixels/width_pixels
# left-multiply the OpenGL projection matrix by M_pytorch_from_opengl to construct an appropriate K matrix
K = (M_pytorch_from_opengl*M_proj).A
print(M_pytorch_from_opengl)
print(M_proj)
print(K)
# K can be passed into the FoVPerspectiveCameras constructor and perfectly matches the automatically computed K
cameras = FoVPerspectiveCameras(device=device, R=R, T=T, K=[K])
raster_settings = RasterizationSettings(
image_size=[height_pixels, width_pixels],
blur_radius=0.0,
faces_per_pixel=1,
)
# Place a point light in front of the object. As mentioned above, the front of the cow is facing the
# -z direction.
lights = PointLights(device=device, location=[[0.0, 0.0, -3.0]])
# Create a Phong renderer by composing a rasterizer and a shader. The textured Phong shader will
# interpolate the texture uv coordinates for each vertex, sample from a texture image and
# apply the Phong lighting model
renderer = MeshRenderer(
rasterizer=MeshRasterizer(
cameras=cameras,
raster_settings=raster_settings
),
shader=SoftPhongShader(
device=device,
cameras=cameras,
lights=lights
)
)
images = renderer(mesh)
plt.imsave("user_K_384_512.png", images[0, ..., :3].cpu().numpy())
plt.figure(figsize=(10, 10))
plt.imshow(images[0, ..., :3].cpu().numpy())
plt.axis("off");
Here is the resulting rendering output, which pixel-perfectly matches the results for an automatically computed K
:
@mikeroberts3000 Thank you first!
I ran into a similar problem, but it was from Unity to Pytorch3D.
For anyone who may have the same problem, I wanna share my solution here.
def transform_unity_projection_matrix(projection_matrix):
max_value = max(projection_matrix[0][0], projection_matrix[1][1])
projection_matrix[0][0] = max_value
projection_matrix[1][1] = max_value
projection_matrix[1, 2] *= -1
projection_matrix[2, 2] *= -1
projection_matrix[3, 2] *= -1
return projection_matrix
Note: I didn't use the aspect ratio to update element [0][0] and element [1][1] here because the recalculated results were not exactly identical. I assign the max value because they are supposed to be identical.
Hello there, I'm experiencing a subtle issue with the
FoVPerspectiveCameras
class that only occurs when I supply my ownK
matrix, and only for non-square images.I'm following the
render_textured_meshes
tutorial example exactly, but I'm experimenting with the following two modifications. First, I am attempting to change theRasterizationSettings
object to render non-square images. Second, I am attempting to supply my own perspective projection matrix to theFoVPerspectiveCameras
class. I expect that both modifications should be possible at the same time, but I'm getting what I think is incorrect rendering output in some cases.In my experiments, I compute my projection matrix exactly the same as the
glFrustum
OpenGL function, as documented here [1,2]. I have used the exact same code to pixel-perfectly match the rendering output from V-Ray (a commercial rendering engine) throughout the Hypersim project [3], so I consider my code to be a correct reimplementation ofglFrustum
. I take care to account for the fact that PyTorch3D's NDC coordinate conventions differ from OpenGL's, as documented here [4].I found that for square images, I get identical rendering output when I supply my own
K
matrix and when I letFoVPerspectiveCameras
compute its ownK
matrix. So far, so good.For non-square images, I get the correct rendering output when I let
FoVPerspectiveCameras
compute its ownK
matrix. I can tell it's correct because the rendering output is pixel-perfect identical to the square image case, but with more white pixels on the left and right borders of the image. This is exactly what you would expect if you keep the vertical field-of-view fixed, keep the image height fixed, and increase the image width, which is what I'm doing in my experiment. Again, so far, so good.But I get incorrect rendering output when I have a non-square image and I supply my own
K
matrix. This is the case even though I correctly (i.e., my implementation matchesglFrustum
) account for the image's aspect ratio in my code that computes the projection matrix.Here is the rendered output when
[height, width] == [384, 384]
. It is the same for an automatically computedK
and user-suppliedK
:Here is the rendered output when
[height, width] == [384, 512]
, and the field-of-view is fixed, for an automatically computedK
. In this case, I would expect the cow to be rendered pixel-for-pixel identically to the image above, but with more white pixels on the left and right of the image, and that is exactly what we get:But here is the rendered output when
[height, width] == [384, 512]
, and the field-of-view is fixed, when I supply my ownK
. In this case, I would expect an identical image to the one immediately above, but that is not what I get:Is it expected that I that an OpenGL projection matrix (e.g., as computed by
glFrustum
) can be supplied directly to theFoVPerspectiveCameras
constructor as a user-suppliedK
matrix, assuming the user takes care to flip the sign of the (3,2) entry to account for PyTorch3D's different NDC conventions? If so, I believe the behavior I'm experiencing is a bug in PyTorch3D, because I'm providing a correct (with respect to the expectation above)K
matrix that agrees with theglFrustum
documentation and accounts for the different NDC conventions, but I'm not getting the correct rendered output.I'll post code snippets in a follow-up post below.
[1] https://docs.microsoft.com/en-us/windows/win32/opengl/glfrustum [2] http://nehe.gamedev.net/article/replacement_for_gluperspective/21002/ [3] https://github.com/apple/ml-hypersim [4] https://github.com/facebookresearch/pytorch3d/blob/main/pytorch3d/renderer/cameras.py#L596