dcharatan / pixelsplat

[CVPR 2024 Oral, Best Paper Runner-Up] Code for "pixelSplat: 3D Gaussian Splats from Image Pairs for Scalable Generalizable 3D Reconstruction" by David Charatan, Sizhe Lester Li, Andrea Tagliasacchi, and Vincent Sitzmann
http://davidcharatan.com/pixelsplat/
MIT License
864 stars 60 forks source link

Questions about implementing pixelsplat on scannet dataset #49

Closed railgun526 closed 7 months ago

railgun526 commented 7 months ago

Hi @dcharatan, first I'd like to thank you on your excellent work and congratulate on the CVPR acceptance of your paper! Recently I am working on implementing pixelsplat on scannet dataset, but I have met with a few problems. The one is about intrinsic matrix. For scannet, the intrinsic matrix looks like this: tensor([[[[72.2301, 0.0000, 39.9825],[ 0.0000, 72.2301, 29.8596],[ 0.0000, 0.0000, 1.0000]]) but for this repo on re10k, it looks like [0.8900, 0.0000, 0.5000], [0.0000, 0.8902, 0.5000], [0.0000, 0.0000, 1.0000]. I am wondering if you have made any modifications on the original intrinsic matrix, like changing the unit from millimeter to pixel? The other is that I cannot fully understand what "far" and "near" plane mean in this setting. I had seen your comments in #25 but I still didn't know how to set the distance to make disparity negligible. Could you please provide some example codes? I'd appreciate it if you could help me!

dcharatan commented 7 months ago

Generally, intrinsics are provided in units of pixels in the following format:

[[fx,  0, cx],
 [ 0, fy, cy],
 [ 0,  0,  1]]

The intrinsics pixelSplat uses are normalized, i.e., the first row is divided by w, and the second row is divided by h. This makes it so the intrinsics don't change if you uniformly scale the image.

I'm not sure what format ScanNet uses, but if the ScanNet intrinsics are in units of pixels, the same division should apply. The slightly suspicious thing is that if you do this, cx and cy aren't 0.5 anymore, which is the case for almost all uncropped images. Are you sure the ScanNet intrinsics haven't been modified somehow? What size are the images these intrinsics correspond to?

As for the near and far planes, you can ignore the ones set inside the dataset itself. These are only used by the baselines, which don't automatically choose near and far planes. pixelSplat will automatically pick near/far planes using the code in src/datasets/shims/bounds_shim.py.

railgun526 commented 7 months ago

Thank you, it works!

Pixie8888 commented 7 months ago

Generally, intrinsics are provided in units of pixels in the following format:

[[fx,  0, cx],
 [ 0, fy, cy],
 [ 0,  0,  1]]

The intrinsics pixelSplat uses are normalized, i.e., the first row is divided by fx, and the second row is divided by fy. This makes it so the intrinsics don't change if you uniformly scale the image.

I'm not sure what format ScanNet uses, but if the ScanNet intrinsics are in units of pixels, the same division should apply. The slightly suspicious thing is that if you do this, cx and cy aren't 0.5 anymore, which is the case for almost all uncropped images. Are you sure the ScanNet intrinsics haven't been modified somehow? What size are the images these intrinsics correspond to?

As for the near and far planes, you can ignore the ones set inside the dataset itself. These are only used by the baselines, which don't automatically choose near and far planes. pixelSplat will automatically pick near/far planes using the code in src/datasets/shims/bounds_shim.py.

Hi @dcharatan , the scannet intrinsics are in units of pixels, and they are correspond to image size of (h,w)=(968,1296). But why the intrinsic is devided by (fx, fy)? I think it should be devided by (w, h) according to README. I devide the intrinsic[:2] with (w,h), and plot the epipolar line as below:

image

In addition, I noticed that cx and cy is always 0.5 in your data. But after the devision of scannet intrinsic, cx and cy won't be 0.5. Will it be a problem?

dcharatan commented 7 months ago

@Pixie8888 You're right. The README is correct, and what I originally wrote above (dividing by fx and fy instead of w and h) was a mistake/typo--sorry about that! In general, having cx and cy be a value that's not 0.5 isn't inherently a problem, but it should make you double-check that the intrinsics are correct. As for as the pixelSplat code is concerned, since it's based on diff-gaussian-rasterization, it only supports a principal point in the center of the image at (0.5, 0.5). As a workaround, I would suggest cropping the input images so the principal point is at the image center and then adjusting the intrinsics accordingly.