facebookresearch / pytorch3d

PyTorch3D is FAIR's library of reusable components for deep learning with 3D data
https://pytorch3d.org/
Other
8.7k stars 1.3k forks source link

unproject image to pointcloud in pytorch3d #1709

Closed jjjkkyz closed 8 months ago

jjjkkyz commented 9 months ago

I am working to generate pointcloud from image with pytorch3d. As mentioned in https://pytorch3d.org/docs/cameras , I make a demo:

image_size = 768
fcl_screen = image_size
prp_screen = ((fcl_screen,fcl_screen),)
R, T = look_at_view_transform(dist, elev, azim)
image_size = torch.tensor([image_size, image_size]).unsqueeze(0)
cameras = PerspectiveCameras(focal_length=fcl_screen, principal_point=prp_screen, R=R, T=T, device=device, image_size=image_size, in_ndc=False)

x = torch.linspace(image_size-1,0, image_size)
y = torch.linspace(image_size-1,0, image_size )
y, x = torch.meshgrid(x, y)
x = x.unsqueeze(-1).cuda()
y = y.unsqueeze(-1).cuda()

xy_depth = torch.cat((x,y , fragments.zbuf[0]), dim=-1).reshape((-1,3))
xyz = cameras[i].unproject_points(xy_depth,world_coordinates=True)

Depth image is generated by rendering an input mesh.

My question is: should I use torch.linspace(image_size-1,0, image_size) or torch.linspace(image_size,1, image_size) to match the coordinate with fragments.zbuf? And why torch.linspace(image_size-1,0, image_size) work well but torch.linspace(0,image_size-1, image_size) get a wrong result? It seems that fragment is not formed as the same?

bottler commented 9 months ago

The function get_rgbd_point_cloud from pytorch3d.implictron.tools.point_cloud_utils (see here) is the recommended built-in way to do unprojection. It gets the detail right.

(I think pytorch3d cameras always measure screen space from the outside of outer pixels, ie align_corners=True, so some -1s are needed.)

jjjkkyz commented 9 months ago

The function get_rgbd_point_cloud from pytorch3d.implictron.tools.point_cloud_utils (see here) is the recommended built-in way to do unprojection. It gets the detail right.

(I think pytorch3d cameras always measure screen space from the outside of outer pixels, ie align_corners=True, so some -1s are needed.)

Thanks for reply. By quick test code in get_rebd_point_cloud, I found that: If I project the unproject pointcloud back to camera, I get coordinate that is image_height-0.5, image_height-1.5 ... 1.5, 0.5. So, is that means pytorch3d render a image by sample in X.5 coordinate in screen space(torch.linspace(image_size-0.5,0.5, image_size))? Not as I post in above which is torch.linspace(image_size-1,0, image_size)

jjjkkyz commented 9 months ago

Meanwhile, project pointcloud back to depth image by cameras.transform_points(pts_3d) does not match the input depth image, what may cause this?

jjjkkyz commented 9 months ago

As I want to get a function(and its inverse function) bettween rendered image and it's unproject point cloud, simply use build-in function may not be enough. I rewrite my question below and thanks for any reply.

  1. When pytorch3d render a pixel, what is its coordinate in camera space? (ie [0,1,2...image_size-1], [image_size, image_size-1...1,0], [0.5,1.5...image_size-0.5]).
  2. Is camera.transform_points and camera.unproject_points meet my requirements to be a function(and its inverse function) bettween rendered image and it's unproject point cloud? Or I need some other function such as ray_bundle_to_ray_points used in get_rgbd_point_cloud.
bottler commented 9 months ago

Re 2, for a given camera type it should be possible with unproject_points. One problem with unproject_points is that its behaviour is not completely consistent between camera types.

Re 1, the edge of a pixel is the edge of the image, so in simple cases the pixel centers are at 0.5, 1.5 etc.

jjjkkyz commented 9 months ago

Sure, I sample a grid of point in camera A, unproject then to word and then transform to camera B. It seems that points do not lie in correct place in camera B. How can I solve it.

jjjkkyz commented 9 months ago

In details, I have a list of cameras, which is surrounding a mesh. pixels can be unproject and transform correct in therir own cameras. But resulting in a wrong coordinate in others cameras. For example, sample a pixels in camera A's center(50, 50, 0.5), unproject it to world and transform to camera B,C,D... (cameras[1].transform_points(cameras[0].unproject_points(torch.FloatTensor([[50,50,0.5]]).cuda()))), results always have a negtive value in position. AS all cameras is surrounding a mesh, any camera's center should be projected in other camera's bounding box (0~image_size ), it should not happen.

jjjkkyz commented 9 months ago

I found it is because I have ndc and screen camera in the same list. So, I should use `torch.linspace(image_size-0.5,0.5, image_size)/image_size*2-1 for ndc camera?

bottler commented 9 months ago

I found it is because I have ndc and screen camera in the same list. So, I should use `torch.linspace(image_size-0.5,0.5, image_size)/image_size*2-1 for ndc camera?

(Easiest not to mix ndc and non-ndc.) Assuming a square image, the ndc of the (centre of the) edge pixels is probably 1-(1/image_size) and -1+(1/image_size), which matches that formula yes.