btsmart / splatt3r

Official repository for Splatt3R: Zero-shot Gaussian Splatting from Uncalibrated Image Pairs
Other
427 stars 15 forks source link

Residual color prediction #12

Closed bluestyle97 closed 3 weeks ago

bluestyle97 commented 3 weeks ago

Hi, thanks for your awesome work! In Sec 3.3 of the paper, you mentioned that "Additionally, to aid in the learning of high-frequency color, we seek to predict the residual between each pixel’s color and the color we apply to that pixel’s corresponding Gaussian primitive". Could you please explain more details on this? Also, could you please point out the location of the code corresponding to this operation in this repo?

btsmart commented 3 weeks ago

Hello, you can find this step in lines 100-107 of main.py here: https://github.com/btsmart/splatt3r/blob/8123e4b0e3258d80fe724db8033fc6aca56decf0/main.py#L102. We found that without these lines (i.e. trying to have the network directly predict the color of the Gaussians without any skip connections), the predicted colors were low frequency and didn't capture discontinuities in the scene very well. To help with this we train the network not to predict the output color, but to predict the difference between the input pixel color and the color of the Gaussian for that pixel. If the network predicts 'zero' for the color differences, then the Gaussians will just have whatever color the input pixels have.

Our implementation is a bit hacky, particularly when it comes to higher degree spherical harmonics. We convert the input pixel color into zero-degree spherical harmonics form, then add that to the first term of the predicted spherical harmonics. I would be interested to see other implementations for this same process (such as adding skip connections to the model internally from the ViT to the Gaussian head) which may improve performance, particular when using higher degree spherical harmonics.

bluestyle97 commented 3 weeks ago

Thanks for your detailed explanation! Congratulations to your great work again!