dcharatan / pixelsplat

[CVPR 2024 Oral, Best Paper Runner-Up] Code for "pixelSplat: 3D Gaussian Splats from Image Pairs for Scalable Generalizable 3D Reconstruction" by David Charatan, Sizhe Lester Li, Andrea Tagliasacchi, and Vincent Sitzmann
http://davidcharatan.com/pixelsplat/
MIT License
864 stars 60 forks source link

Question about paper #62

Closed Gynjn closed 6 months ago

Gynjn commented 6 months ago

Thanks for sharing your great effort and work.

I have an question about the meaning of the line in page 4

These enable our encoder to propagate scaled depth estimates to parts of the image feature maps that may not have any epipolar correspondences in the opposite image.

I get the meaning "These enable our encoder to propagate scaled depth estimates to parts of the image feature maps" of this part about describing the reason why you used conv block and attention, but what does it mean about the "that may not have any epipolar correspondences in the opposite image." in this context?

Thanks in advance:)

dcharatan commented 6 months ago

Since the two images have slightly different viewpoints, not every point in the first image is expected to have a visible correspondence in the second image.

A slightly more nuanced version of this is that not every ray in the first image has a corresponding epipolar line that's visible in the second image. That's what's shown below—the top right image shows all the pixels in the top left image where the corresponding epipolar lines cross the bottom right image. For the black pixels, there can be no epipolar correspondence in the second image, since there isn't even a (visible) epipolar line.

image