facebookresearch / pytorch3d

PyTorch3D is FAIR's library of reusable components for deep learning with 3D data
https://pytorch3d.org/
Other
8.73k stars 1.31k forks source link

point cloud rendering with identity pose produces distorted images #880

Closed zhengqili closed 2 years ago

zhengqili commented 3 years ago

If you do not know the root cause of the problem / bug, and wish someone to help you, please post according to this template:

🐛 Bugs / Unexpected behaviors

When rendering point cloud with identity camera pose, the rendered images has distortion for both foreground and background. This is related to issue from https://github.com/facebookresearch/pytorch3d/issues/811

Instructions To Reproduce the Issue:

If you run the python file I provided below, it render images at the same viewpoint as input image using predicted mono-depth by converting it to point cloud.

What I expect and hope is rendered images should not has any pixel shift and only induce blurriness from point rasterization. However, rendered images as shown in img_2.png has visible pixel distortion compared with original image img_1.png. This problem is more severe in low resolution (128x128) as shown in this example. The distortion will be less severe if I reduce blur radius but it can cause black holes or boundary

I am wondering if this is expected or there is a way to render a perfect matched image with only blurriness induced from an identity camera pose, since I don't want background shift after performing rendering (background should has ~0 disparity should not move at all after rendering), which can be a problem in my use case.

debug.tar.gz

gkioxari commented 2 years ago

Hi @zl548, I will look into this soon and report back!

nikhilaravi commented 2 years ago

@zl548 did you already try the fixes mentioned in #811?

zhengqili commented 2 years ago

Yes. I tried the fixes, but it seems that after point cloud rendering even with identity pose, the color will still be displaced by a few pixel.

github-actions[bot] commented 2 years ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

gkioxari commented 2 years ago

Hey @zl548! I am now looking into this. I was wondering if you can also provide the depth from the MiDas model in addition to the image so that I can re-produce this. Thank you!

gkioxari commented 2 years ago

Ok, I have the solution for you. I don't need the MiDas depth prediction, as it's irrelevant to your question. Here is a code snippet that produces the same image and I will walk you through the various nuances of rendering at the bottom that cause the differences in your case.

class myRenderer(PointsRenderer):
    def forward(self, point_clouds, **kwargs) -> torch.Tensor:
        fragments = self.rasterizer(point_clouds, **kwargs)

        # Construct weights based on the distance of a point to the true point.
        # However, this could be done differently: e.g. predicted as opposed
        # to a function of the weights.
        r = self.rasterizer.raster_settings.radius

        dists2 = fragments.dists.permute(0, 3, 1, 2)
        weights = torch.ones_like(dists2)  # 1 - dists2 / (r * r)

        images = self.compositor(
            fragments.idx.long().permute(0, 3, 1, 2),
            weights,
            point_clouds.features_packed().permute(1, 0),
            **kwargs,
        )

        # permute so image comes at the end
        images = images.permute(0, 2, 3, 1)

        return images, fragments

device = torch.device("cuda:0")
K = np.array(
    [[138.0, 0.0, 256.0], [0.0, 138.0, 256.0], [0.0, 0.0, 1.0]], dtype=np.float32
)

K[0:2, :] = K[0:2, :] / 2.0

K = torch.tensor(K, dtype=torch.float32).to(device)

R = torch.eye(3).cuda()
t = torch.zeros(3).cuda()

Kinv = K.inverse()

img = Image.open("/tmp/0-img_1.png").convert("RGB")
img = np.array(img)
img = torch.from_numpy(img).to(dtype=torch.float32, device=device) / 255.0
h, w = img.shape[:2]

depth = torch.rand((h, w), device=device) * 3.0 + 1.0  # random depth-ish

y, x = torch.meshgrid(
    torch.linspace(-1.0 + 1.0 / h, 1.0 - 1.0 / h, h),
    torch.linspace(-1.0 + 1.0 / w, 1.0 - 1.0 / w, w),
)
y = -y.to(device)
x = -x.to(device)

# point cloud
points = torch.stack([x.flatten(), y.flatten(), depth.flatten()], dim=1)
rgb = img.reshape(h * w, 3)
cloud = Pointclouds(points=[points], features=[rgb])

camera = OrthographicCameras(device=device)

raster_settings = PointsRasterizationSettings(
    image_size=(h, w),
    radius=1e-6,
    points_per_pixel=1,
)

rasterizer = PointsRasterizer(cameras=camera, raster_settings=raster_settings)
renderer = myRenderer(rasterizer=rasterizer, compositor=NormWeightedCompositor())

render_img, fragments = renderer(cloud) # output fragments for debugging
rendered_img = (render_img[0] * 255.0).cpu().numpy()
Image.fromarray(rendered_img.astype(np.uint8)).save("/tmp/0_rendered_img.png")

This produces (which are the same) 0_rendered_img 0_rendered_img

Ok now, here are the nuances.

These details guarantee that rendering will give the same image, and it does. Note that I assumed an orthographic camera and the depth is random, but that shouldn't matter, as unprojection followed by projection on the same image should be the identity operation. I hope this helps! Closing this issue, but feel free to follow up.