dcharatan / pixelsplat

[CVPR 2024 Oral, Best Paper Runner-Up] Code for "pixelSplat: 3D Gaussian Splats from Image Pairs for Scalable Generalizable 3D Reconstruction" by David Charatan, Sizhe Lester Li, Andrea Tagliasacchi, and Vincent Sitzmann
http://davidcharatan.com/pixelsplat/
MIT License
830 stars 56 forks source link

Question about calculating the position of gaussian #82

Closed kevinchiu19 closed 2 months ago

kevinchiu19 commented 2 months ago

Thanks for sharing your great effort and work.

I have a little question here in src/model/encoder/common/gaussian_adapter.py:

Compute Gaussian means.

origins, directions = get_world_rays(coordinates, extrinsics, intrinsics)
means = origins + directions * depths[..., None]

The directions used in the calculation here has been normalized, which will cause the calculation result to not be the correct world coordinate system.

For example, in the autonomous driving scene, the xyz of the point cloud is directly used as the gaussian position. Here, normalization and non-normalization are used. The results are as shown in the figure below. image

dcharatan commented 2 months ago

Depth can either be defined as distance along the ray or as Z depth (distance along the camera look vector/Z coordinate in camera space). Since depth is predicted by a neural network, the convention that's being used doesn't matter—the network will simply learn whatever convention is being used.

If you want to switch to the other convention (Z depth), you can replace this line with the following:

directions = directions / directions[..., -1:]

This will normalize by the Z coordinate instead of by ray length.

kevinchiu19 commented 2 months ago

Depth can either be defined as distance along the ray or as Z depth (distance along the camera look vector/Z coordinate in camera space). Since depth is predicted by a neural network, the convention that's being used doesn't matter—the network will simply learn whatever convention is being used.

If you want to switch to the other convention (Z depth), you can replace this line with the following:

directions = directions / directions[..., -1:]

This will normalize by the Z coordinate instead of by ray length.

OK, thank you for the detailed explanation, and thank you again for your great work!