Difficulty with Back-Projection and Merging of Point Clouds Using `PerspectiveCameras.unproject_points()

AbdulRehman555 commented 9 months ago

I am currently facing a challenge with the process of back-projecting 2D points into 3D world points using the PerspectiveCameras.unproject_points() method. The primary goal is to effectively merge these newly obtained 3D world points with an existing set of world points, or alternatively, with the world points derived from another image.

To give you a clearer picture of my current setup, here is a snippet of the code I'm working with:

import json
import torch
import matplotlib.pyplot as plt
from IPython.core.pylabtools import figsize
import numpy as np
from PIL import Image
from typing import Callable, List, Optional, Tuple
from pytorch3d.io import load_objs_as_meshes, load_obj
import pytorch3d
from pytorch3d.structures import Meshes
import pytorch3d.utils
from pytorch3d.renderer import (
    FoVPerspectiveCameras,
    PointLights,
    Materials,
    RasterizationSettings,
    MeshRenderer,
    MeshRasterizer,
    HardPhongShader,
    TexturesUV,
    TexturesVertex,
    Textures
)

if torch.cuda.is_available():
    device = torch.device("cuda:0")
    torch.cuda.set_device(device)
else:
    device = torch.device("cpu")

def get_look_at_views(points: torch.Tensor, look_at_points: torch.Tensor):
    R, T = pytorch3d.renderer.look_at_view_transform(at=look_at_points, eye=points)
    return R.to(points.device), T.to(points.device)

def generate_camera_locations(center: torch.Tensor, radius: float, num_points: int) -> torch.Tensor:
    theta = torch.linspace(0, 2 * torch.pi, num_points, dtype=center.dtype, device=center.device)
    x = center[0] + radius * torch.cos(theta)
    z = center[2] + radius * torch.sin(theta)

    camera_locations = torch.stack([z, torch.zeros_like(x), x], dim=1)
    return camera_locations

img_resolution = (256, 256)
raster_settings = RasterizationSettings(
    image_size=img_resolution,
    bin_size=None,
    blur_radius=0.0,
    faces_per_pixel=1,
)
lights = PointLights(device=device, location=[[-2.0, -2.0, -5.0]])
materials = Materials(
    device=device,
    specular_color=[[0.0, 0.0, 0.0]],
    shininess=0.0
)
rasterizer=MeshRasterizer(raster_settings=raster_settings)

renderer = MeshRenderer(
    rasterizer=rasterizer,
    shader=HardPhongShader(device=device, lights=lights)
)

mesh = load_objs_as_meshes(['stanford-bunny.obj'], device=device)
verts, faces = mesh.get_mesh_verts_faces(0)
texture_rgb = torch.ones_like(verts, device=device)
texture_rgb[:, 1:] *= 0.0  # red, by zeroing G and B
mesh.textures = Textures(verts_rgb=texture_rgb[None])

verts = verts - verts.mean(dim=0)
verts /= verts.max()

mesh = mesh.update_padded(verts.unsqueeze(0))
verts, faces = mesh.get_mesh_verts_faces(0)

points = generate_camera_locations(torch.tensor([0., 0., 0.], device=device), 3, 100)

R_pt3d, T_pt3d = get_look_at_views(points, torch.zeros_like(points))
K_pt3d = torch.tensor([[0.7, 0., 0.5, 0.],
                        [0., 0.7, 0.5, 0.],
                        [0., 0., 0., 1.0],
                        [0., 0., 1., 0.]], device=device)

cams = pytorch3d.renderer.cameras.PerspectiveCameras(R=R_pt3d, T=T_pt3d, K=K_pt3d.unsqueeze(0),
                                                     in_ndc=False, image_size=[(1, 1)],
                                                     device=device)

images = renderer(mesh.extend(100), cameras=cams, lights=lights)

cam1_idx = 0
cam2_idx = 80

mv_cams = cams[[cam1_idx, cam2_idx]]

fragments = rasterizer(mesh.extend(2), cameras=mv_cams)
depths = fragments.zbuf

xy_pix = get_normalized_pixel_coordinates(img_resolution[0], img_resolution[1], device=device)

xy_pix = xy_pix.flatten(0, -2)

depths = depths.flatten(1, -2)

cam1 = mv_cams[0]
depth_1 = depths[0]
xy_depth_1 = torch.cat((xy_pix, depth_1), dim=1)
pts_3d_1 = cam1.unproject_points(xy_depth_1, world_coordinates=True)
filtered_3d_pts_1 = pts_3d_1[depth_1.view(-1,1).squeeze()!=-1, :]

print(filtered_3d_pts_1.shape, filtered_3d_pts_1.min(), filtered_3d_pts_1.max())

plot_pointcloud(filtered_3d_pts_1)

cam2 = mv_cams[1]
depth_2 = depths[1]
xy_depth_2 = torch.cat((xy_pix, depth_2), dim=1)
pts_3d_2 = cam2.unproject_points(xy_depth_2, world_coordinates=True)
filtered_3d_pts_2 = pts_3d_2[depth_2.view(-1,1).squeeze()!=-1, :]

print(filtered_3d_pts_2.shape, filtered_3d_pts_2.min(), filtered_3d_pts_2.max())

plot_pointcloud(filtered_3d_pts_2)

To elaborate, my specific questions and concerns are as follows:

What are the best practices or steps for accurately performing back-projection using PerspectiveCameras.unproject_points()?
Once the 3D world points are obtained, what is the most efficient method for merging these points with an existing set of world points?
Are there any particular considerations or common pitfalls I should be aware of during this process, especially regarding the accuracy and alignment of the merged point clouds?

Any guidance, examples, or references to relevant documentation would be immensely helpful in resolving this issue. Thank you in advance for your assistance!

bottler commented 9 months ago

I suggest not using unproject_points for back projection. It is complicated. The function get_rgbd_point_cloud is the friendly interface for backprojecting rgbd data through a camera.
The friendly way to join point cloud data together is join_pointclouds_as_scene.
If you try to roll your own backprojection there are quite a few complications, like accounting for conventions for align_corners and non-square pixels. get_rgbd_point_cloud should match our renderer so there's nothing strange.

ShengCN commented 2 months ago

unproject_points is pretty complicated and unclear based on existing documentations.

get_rgbd_point_cloud works pretty good.

facebookresearch / pytorch3d

Difficulty with Back-Projection and Merging of Point Clouds Using `PerspectiveCameras.unproject_points() #1690