facebookresearch / pytorch3d

PyTorch3D is FAIR's library of reusable components for deep learning with 3D data
https://pytorch3d.org/
Other
8.82k stars 1.32k forks source link

Some Error of Rasterization on CUDA and CPU #1523

Closed GloryyrolG closed 1 year ago

GloryyrolG commented 1 year ago

🐛 Bugs / Unexpected behaviors

zbufs returned by Rasterizer on different devices are different, e.g., 37.4441 on CUDA and 37.4562 on CPU. Coincidentally, I found that this problem also exists in einsum (if rasterization also uses this function, this may cause the problem). Specifically, use einsum to rotate points with an identity matrix, and einsum on CUDA will bring a small error instead of 0 as on CPU (see perspective_projection as below).

Instructions To Reproduce the Issue:

Please include the following (depending on what the issue is):

  1. Any changes you made (git diff) or code you wrote:
    
    import os
    os.environ["PYOPENGL_PLATFORM"] = "egl"
    from typing import Optional
    from matplotlib import pyplot as plt
    import numpy as np
    import torch
    from pytorch3d.renderer import (
    RasterizationSettings,
    MeshRasterizer,
    MeshRenderer,
    TexturesVertex,
    HardPhongShader
    )
    from pytorch3d.structures import Meshes
    from pytorch3d.utils.camera_conversions import cameras_from_opencv_projection

def perspective_projection(points: torch.Tensor, translation: torch.Tensor, focal_length: torch.Tensor, camera_center: Optional[torch.Tensor] = None, rotation: Optional[torch.Tensor] = None) -> torch.Tensor: """ Computes the perspective projection of a set of 3D points. Args: points (torch.Tensor): Tensor of shape (B, N, 3) containing the input 3D points. translation (torch.Tensor): Tensor of shape (B, 3) containing the 3D camera translation. focal_length (torch.Tensor): Tensor of shape (B, 2) containing the focal length in pixels. camera_center (torch.Tensor): Tensor of shape (B, 2) containing the camera center in pixels. rotation (torch.Tensor): Tensor of shape (B, 3, 3) containing the camera rotation. Returns: torch.Tensor: Tensor of shape (B, N, 2) containing the projection of the input points. """ batch_size = points.shape[0] if rotation is None: rotation = torch.eye(3, device=points.device, dtype=points.dtype).unsqueeze(0).expand(batch_size, -1, -1) if camera_center is None: camera_center = torch.zeros(batch_size, 2, device=points.device, dtype=points.dtype)

Populate intrinsic camera matrix K.

K = torch.zeros([batch_size, 3, 3], device=points.device, dtype=points.dtype)
K[:,0,0] = focal_length[:,0]
K[:,1,1] = focal_length[:,1]
K[:,2,2] = 1.
K[:,:-1, -1] = camera_center

# Transform points
rotation_bef = rotation.clone()
points_bef = points.clone()
print("rotation_bef[0, 0]", rotation_bef[0, 0])
print("points_bef[0, 0]", points_bef[0, 0])
######### Problems are here!
points = torch.einsum('bij,bkj->bki', rotation, points)
print("(rotation - rotation_bef).abs().mean()", (rotation - rotation_bef).abs().mean())
print("(points - points_bef).abs().mean()", (points - points_bef).abs().mean())
print("points[0, 0] after einsum", points[0, 0])
points = points + translation.unsqueeze(1)

# Apply perspective distortion
projected_points = points / points[:,:,-1].unsqueeze(-1)

# Apply camera intrinsics
projected_points = torch.einsum('bij,bkj->bki', K, projected_points)

return projected_points[:, :, :-1]

Args

focal_length = 5000 image_size = 224 faces_per_pixel = 2 vizimg = True

dtype = torch.float32 device = torch.device('cuda') # torch.device('cpu')

npz = np.load('mesh.npz', mmap_mode='r') joints, vertices, faces, cam_t = (torch.from_numpy(npz['joints']).to(device), torch.from_numpy(npz['vertices']).to(device), torch.from_numpy(npz['faces']).to(device), torch.from_numpy(npz['cam_t']).to(device)) B = joints.shape[0]

textures = TexturesVertex(torch.ones_like(vertices)) meshes = Meshes(vertices, faces, textures=textures)

Using the OpenCV coord sys

cameras = cameras_from_opencv_projection(torch.eye(3, device=device)[None, ...].repeat(B, 1, 1), cam_t, torch.tensor([[focal_length, 0, image_size / 2], [0, focal_length, image_size / 2], [0, 0, 1]], device=device)[None, ...].repeat(B, 1, 1), torch.ones(B, 2, device=device) * image_size) raster_settings = RasterizationSettings(image_size=image_size, faces_per_pixel=faces_per_pixel) rasterizer = MeshRasterizer( cameras=cameras, raster_settings=raster_settings )

zbufs = rasterizer(meshes).zbuf # shape: (N, H, W, K)

renderer = MeshRenderer( rasterizer, HardPhongShader(device=device, cameras=cameras) ) images = renderer(meshes)

keypoints_2d = perspective_projection(joints, cam_t, torch.ones(B, 2, device=device) focal_length, camera_center=torch.ones(B, 2, device=device) image_size / 2)

jid = 25 + 6 print("keypoints_2d[0, jid]", keypoints_2d[0, jid]) print("zbufs[0, 23, 134]", zbufs[0, 23, 134]) print("joints[0, jid, 2] + cam_t[0, 2]", joints[0, jid, 2] + cam_t[0, 2])

if vizimg: plt.close() plt.imshow(images[0].cpu().numpy()) plt.savefig('image.png')

2. The exact command(s) you ran [NO]
3. What you observed (including the full logs):
GPU results

rotation_bef[0, 0] tensor([1., 0., 0.], device='cuda:0') points_bef[0, 0] tensor([ 0.1155, -0.8138, -0.2963], device='cuda:0') points[0, 0] after einsum tensor([ 0.1155, -0.8140, -0.2964], device='cuda:0') (rotation - rotation_bef).abs().mean() tensor(0., device='cuda:0') (points - points_bef).abs().mean() tensor(6.2104e-05, device='cuda:0') keypoints_2d[0, jid] tensor([133.5530, 23.0413], device='cuda:0') zbufs[0, 23, 134] tensor([37.4441, 37.5067], device='cuda:0') joints[0, jid, 2] + cam_t[0, 2] tensor(37.5164, device='cuda:0')

CPU results

rotation_bef[0, 0] tensor([1., 0., 0.]) points_bef[0, 0] tensor([ 0.1155, -0.8138, -0.2963]) (rotation - rotation_bef).abs().mean() tensor(0.) (points - points_bef).abs().mean() tensor(0.) points[0, 0] after einsum tensor([ 0.1155, -0.8138, -0.2963]) keypoints_2d[0, jid] tensor([133.5542, 22.9965]) zbufs[0, 23, 134] tensor([37.4562, 37.5189]) joints[0, jid, 2] + cam_t[0, 2] tensor(37.5164)

GloryyrolG commented 1 year ago

I think the answer is my PyTorch (v1.9.0) is not well adapted to new GPUs. As per the issue.