facebookresearch / pytorch3d

PyTorch3D is FAIR's library of reusable components for deep learning with 3D data
https://pytorch3d.org/
Other
8.81k stars 1.32k forks source link

Rendering depth map with Pytorch3D vs Open3D #1753

Closed GasparPizarro closed 8 months ago

GasparPizarro commented 8 months ago

I am rendering depth maps with Pytorch3D, and, given same camera parameters and pose, they do not match the ones I get with Open3D. Using the teapot from the tutorial on camera position optimization, I get a depth map, as suggested in #35.

import numpy as np
import torch
from pytorch3d.structures import Meshes
from pytorch3d.renderer import (
    PerspectiveCameras,
    look_at_view_transform,
    MeshRasterizer,
    RasterizationSettings,
    TexturesVertex,
)
from pytorch3d.io import load_obj
import matplotlib.pyplot as plt

verts, faces_idx, _ = load_obj("./teapot.obj")
faces = faces_idx.verts_idx

mesh = Meshes(
    verts=[verts],
    faces=[faces],
    textures=TexturesVertex(verts_features=torch.ones_like(verts)[None]),
)

height = 512
width = 512
fx = 1000
fy = 1000
cx = width / 2
cy = height / 2

distance = 3  # distance from camera to the object
elevation = 40.0  # angle of elevation in degrees
azimuth = 40.0  # No rotation so the camera is positioned on the +Z axis.

# Get the position of the camera based on the spherical angles
R, T = look_at_view_transform(distance, elevation, azimuth)

cameras = PerspectiveCameras(
    image_size=[[width, height]],
    R=R[None],
    T=T[None],
    focal_length=torch.tensor([[fx, fy]], dtype=torch.float32),
    principal_point=torch.tensor([[cx, cy]], dtype=torch.float32),
    in_ndc=False,
)

rasterizer = MeshRasterizer(
    cameras=cameras,
    raster_settings=RasterizationSettings(
        image_size=[height, width],
    ),
)

fragments = rasterizer(meshes_world=mesh, R=R, T=T)

plt.figure()
plt.imshow(fragments.zbuf[0, :, :, 0])
plt.title("pytorch3d")
plt.show()

and I get this image: Figure_1

However, if I do it with Open3D, using ray_casting, but with the same R and T (but changing axes orientations to match Pytorch3d's axes convention).

from pytorch3d.renderer import look_at_view_transform
import open3d as o3d
import numpy as np
import matplotlib.pyplot as plt

mesh = o3d.t.io.read_triangle_mesh("./teapot.obj")

scene = o3d.t.geometry.RaycastingScene()
scene.add_triangles(mesh)

height = 512
width = 512
fx = 1000
fy = 1000
cx = width / 2
cy = height / 2

distance = 3  # distance from camera to the object
elevation = 40.0  # angle of elevation in degrees
azimuth = 40.0  # No rotation so the camera is positioned on the +Z axis.

# Get the position of the camera based on the spherical angles
R, T = look_at_view_transform(distance, elevation, azimuth)

R = R[0].numpy()
T = T[0].numpy()

intrinsics = np.array([
    [fx, 0, cx],
    [0, fy, cy],
    [0, 0, 1]
])

pose = np.vstack((np.hstack((R, T[:, None])), np.array([0, 0, 0, 1])))

pose[:2] = pose[:2] * -1

rays = o3d.t.geometry.RaycastingScene.create_rays_pinhole(
    intrinsic_matrix=intrinsics,
    extrinsic_matrix=pose,
    width_px=int(width),
    height_px=int(height),
)

ans = scene.cast_rays(rays)

plt.figure()
plt.imshow(ans["t_hit"].numpy())
plt.title("open3d")
plt.show()

I get this:

o3d

It can be seen that the silhouettes of both renders look different (regardless of the actual depth values), which suggests to me that there is a difference with the interpretation of R and T by Open3D and Pytorch3D, or in my conversion from one to the other.

What should I do to get the same depth map from both approaches?

By the way, if I leave the pose intact from Open3d, without doing pose[:2] = pose[:2] * -1 I get this:

Figure_1

bottler commented 8 months ago

We can't help you debug this. In your inputs, there is no camera rotation: just elevation and azimuth. So the handle and the spout should be at the same level if up on the teapot is the right up for look_at_view_transformation. But the Open3D output doesn't show this. So the inputs to open3d aren't doing what you think they are.

frederiknolte commented 7 months ago

Use the inverse of R. Seems like Pytorch3D and Open3D have different interpretations regarding rotations (camera-to-world vs world-to-camera or the other way around).