point cloud rendering with identity pose produces distorted images

zhengqili commented 3 years ago

If you do not know the root cause of the problem / bug, and wish someone to help you, please post according to this template:

🐛 Bugs / Unexpected behaviors

When rendering point cloud with identity camera pose, the rendered images has distortion for both foreground and background. This is related to issue from https://github.com/facebookresearch/pytorch3d/issues/811

Instructions To Reproduce the Issue:

If you run the python file I provided below, it render images at the same viewpoint as input image using predicted mono-depth by converting it to point cloud.

What I expect and hope is rendered images should not has any pixel shift and only induce blurriness from point rasterization. However, rendered images as shown in img_2.png has visible pixel distortion compared with original image img_1.png. This problem is more severe in low resolution (128x128) as shown in this example. The distortion will be less severe if I reduce blur radius but it can cause black holes or boundary

I am wondering if this is expected or there is a way to render a perfect matched image with only blurriness induced from an identity camera pose, since I don't want background shift after performing rendering (background should has ~0 disparity should not move at all after rendering), which can be a problem in my use case.

debug.tar.gz

gkioxari commented 2 years ago

Hi @zl548, I will look into this soon and report back!

nikhilaravi commented 2 years ago

@zl548 did you already try the fixes mentioned in #811?

zhengqili commented 2 years ago

Yes. I tried the fixes, but it seems that after point cloud rendering even with identity pose, the color will still be displaced by a few pixel.

github-actions[bot] commented 2 years ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

gkioxari commented 2 years ago

Hey @zl548! I am now looking into this. I was wondering if you can also provide the depth from the MiDas model in addition to the image so that I can re-produce this. Thank you!

gkioxari commented 2 years ago

Ok, I have the solution for you. I don't need the MiDas depth prediction, as it's irrelevant to your question. Here is a code snippet that produces the same image and I will walk you through the various nuances of rendering at the bottom that cause the differences in your case.

class myRenderer(PointsRenderer):
    def forward(self, point_clouds, **kwargs) -> torch.Tensor:
        fragments = self.rasterizer(point_clouds, **kwargs)

        # Construct weights based on the distance of a point to the true point.
        # However, this could be done differently: e.g. predicted as opposed
        # to a function of the weights.
        r = self.rasterizer.raster_settings.radius

        dists2 = fragments.dists.permute(0, 3, 1, 2)
        weights = torch.ones_like(dists2)  # 1 - dists2 / (r * r)

        images = self.compositor(
            fragments.idx.long().permute(0, 3, 1, 2),
            weights,
            point_clouds.features_packed().permute(1, 0),
            **kwargs,
        )

        # permute so image comes at the end
        images = images.permute(0, 2, 3, 1)

        return images, fragments

device = torch.device("cuda:0")
K = np.array(
    [[138.0, 0.0, 256.0], [0.0, 138.0, 256.0], [0.0, 0.0, 1.0]], dtype=np.float32
)

K[0:2, :] = K[0:2, :] / 2.0

K = torch.tensor(K, dtype=torch.float32).to(device)

R = torch.eye(3).cuda()
t = torch.zeros(3).cuda()

Kinv = K.inverse()

img = Image.open("/tmp/0-img_1.png").convert("RGB")
img = np.array(img)
img = torch.from_numpy(img).to(dtype=torch.float32, device=device) / 255.0
h, w = img.shape[:2]

depth = torch.rand((h, w), device=device) * 3.0 + 1.0  # random depth-ish

y, x = torch.meshgrid(
    torch.linspace(-1.0 + 1.0 / h, 1.0 - 1.0 / h, h),
    torch.linspace(-1.0 + 1.0 / w, 1.0 - 1.0 / w, w),
)
y = -y.to(device)
x = -x.to(device)

# point cloud
points = torch.stack([x.flatten(), y.flatten(), depth.flatten()], dim=1)
rgb = img.reshape(h * w, 3)
cloud = Pointclouds(points=[points], features=[rgb])

camera = OrthographicCameras(device=device)

raster_settings = PointsRasterizationSettings(
    image_size=(h, w),
    radius=1e-6,
    points_per_pixel=1,
)

rasterizer = PointsRasterizer(cameras=camera, raster_settings=raster_settings)
renderer = myRenderer(rasterizer=rasterizer, compositor=NormWeightedCompositor())

render_img, fragments = renderer(cloud) # output fragments for debugging
rendered_img = (render_img[0] * 255.0).cpu().numpy()
Image.fromarray(rendered_img.astype(np.uint8)).save("/tmp/0_rendered_img.png")

This produces (which are the same) 0_rendered_img

Ok now, here are the nuances.

Image to NDC: When you want to start from pixels on an image and unproject them in the 3D world, to achieve an exact alignment with the image space when projecting it back, you need know the NDC space definition of the renderer. In our case, we assume that points are centered at each pixel and the half width of pixels is 1/W in x-dimension (and 1/H in y-dimension) -- you can find this here. This guarantees perfect alignment and thus no shifts. You can see this in the fragments.idx which provides the index into the cloud for each pixel in the rendered image.
Rasterization Settings: The rasterization setting can affect the color of the pixels. Above, I have set points_per_pixel to be 1 and the radius to be something very small (In the case of perfect alignment, a radius of 0 should also work but we did not make it a <= but a < here which is our fault). With these settings, we only get 1 point per pixel and thus blending will not alter the original color. Note that if you hadn't achieved point to pixel alignment then setting a small radius could result in empty pixels and in that case you should set the points_per_pixel > 1.
Compositors mess with exposure: The AlphaCompositor also messes with the final color as it weights it by the alpha weights. The NormWeightedCompositor also weights the rgb color with the weights. We have a pre-defined equation for the weights here from SynSin, but you should change it according to your projects. This is why I wrote my own renderer class above to set the weights to ones and also changed the compositor to the norm weighted one to keep the rgb colors the same.

These details guarantee that rendering will give the same image, and it does. Note that I assumed an orthographic camera and the depth is random, but that shouldn't matter, as unprojection followed by projection on the same image should be the identity operation. I hope this helps! Closing this issue, but feel free to follow up.

facebookresearch / pytorch3d

point cloud rendering with identity pose produces distorted images #880

🐛 Bugs / Unexpected behaviors

Instructions To Reproduce the Issue: