facebookresearch / pytorch3d

PyTorch3D is FAIR's library of reusable components for deep learning with 3D data
https://pytorch3d.org/
Other
8.81k stars 1.32k forks source link

MeshRenderer gives unexpected result for non-square image_size #1817

Closed oneleggedredcow closed 5 months ago

oneleggedredcow commented 5 months ago

🐛 Bugs / Unexpected behaviors

The image produced by MeshRenderer does not align with the vertices projected by PerspectiveCameras.transform_points_screen.

Instructions To Reproduce the Issue:

This fairly simple example shows the issue:

import numpy as np
import matplotlib.pyplot as plt
import torch
from pytorch3d.structures import Meshes
from pytorch3d.renderer import MeshRenderer, MeshRasterizer, RasterizationSettings, SoftPhongShader
from pytorch3d.renderer import TexturesVertex
from pytorch3d.utils import cameras_from_opencv_projection

# Camera parameters
R = np.eye(3)
t = np.array([0, 0, 5])
f = 1731.2994092344502
w, h = 1280, 720
K = np.array([
    [f, 0, w/2],
    [0, f, h/2],
    [0, 0, 1],
])

# Mesh parameters
verts = np.array([
    [0.0, 0.0, 0.0],
    [0.5, 0.5, 0.0],
    [0.5, 0.0, 0.0],
])
faces = torch.tensor([[[0, 1, 2]]])

verts = torch.FloatTensor(verts).unsqueeze(0)
tex = TexturesVertex(torch.ones_like(verts))

# Create camera
camera = cameras_from_opencv_projection(
    torch.FloatTensor(R).unsqueeze(0),
    torch.FloatTensor(t).unsqueeze(0),
    torch.FloatTensor(K).unsqueeze(0),
    torch.Tensor([w, h]).unsqueeze(0),
)

# Create mesh and renderer
mesh = Meshes(verts, faces, tex)
renderer = MeshRenderer(
    rasterizer=MeshRasterizer(
        raster_settings=RasterizationSettings(image_size=(h, w)),
        cameras=camera,
    ),
    shader=SoftPhongShader(cameras=camera),
)

# Render image
mesh_img = renderer(mesh)
mesh_img = mesh_img.squeeze().cpu().numpy()

# Project vertices
points = camera.transform_points_screen(mesh.verts_packed())
points = points.squeeze().cpu().numpy()[:, :2]

# Plot
plt.figure(figsize=(10, 5))
plt.imshow(mesh_img)

for pt in points:
    plt.scatter(pt[0], pt[1], color='red', s=50)

plt.show()

This code will produce the following image:

Figure_3

I expected the red dots to align with the vertices from MeshRenderer, but they do not.

The PerspectiveCameras.transform_points_screen result appears to be correct.

The problem only occurs when the image_size is non-square.

Possible Fix:

Looking at the rasterize_meshes_python function. If I change the code to be this:

...
    # Loop through meshes in the batch.
    for n in range(N):
        face_start_idx = mesh_to_face_first_idx[n]
        face_stop_idx = face_start_idx + num_faces_per_mesh[n]

        # Iterate through the horizontal lines of the image from top to bottom.
        for yi in range(H):
            # Y coordinate of one end of the image. Reverse the ordering
            # of yi so that +Y is pointing up in the image.
-            yfix = H - 1 - yi
-            yf = pix_to_non_square_ndc(yfix, H, W)
+            yfix = W - 1 - yi
+            yf = pix_to_non_square_ndc(yfix, W, H)

            # Iterate through pixels on this horizontal line, left to right.
            for xi in range(W):
                # X coordinate of one end of the image. Reverse the ordering
                # of xi so that +X is pointing to the left in the image.
-                xfix = W - 1 - xi
-                xf = pix_to_non_square_ndc(xfix, W, H)
+                xfix = H - 1 - xi
+                xf = pix_to_non_square_ndc(xfix, H, W)
                top_k_points = []
...

And then create and use a PythonMeshRasterizer class where the only difference from MeshRasterizer is that instead of calling rasterize_meshes, I call the modified rasterize_meshes_python given above and it seems to work as expected.

However, this is a bit of hackery and not really a proper fix.

bottler commented 5 months ago

The documentation of cameras_from_opencv_projection at https://github.com/facebookresearch/pytorch3d/blob/main/pytorch3d/utils/camera_conversions.py#L57 suggests that the fourth argument, image size, should have height before width. If I change the call to that function in your example to

camera = cameras_from_opencv_projection(
    torch.FloatTensor(R).unsqueeze(0),
    torch.FloatTensor(t).unsqueeze(0),
    torch.FloatTensor(K).unsqueeze(0),
    torch.Tensor([h, w]).unsqueeze(0),
)

then the output looks okay. I don't think there's anything wrong here.