Confusion about the learning rate in sugar model

JuliusQv commented 11 months ago

Hi ~ Thanks for your amazing work. I am confused about the learning rate for gaussian's position in sugar_optimizer.py why is position_lr_init * spatial_lr_scale ??? From my personal understanding，the "spatial_lr_scale" is the radius, which is acquired from camera positions. Multiplication will result in a larger scene, a larger radius, a larger spatial_lr_scale and a larger positional learning rate.

And this is the exact opposite of what was described in readme.

Should it be position_lr_init / spatial_lr_scale ?

Looking forward for your reply. Best regards!

Anttwo commented 11 months ago

Hi JuliusQv,

Thank you for your nice words!

In general, multiplying the position-learning rate by the scale of the scene (i.e. the radius of camera positions) is a very good idea as it makes the method invariant to the general scale of the scene. This was actually the idea of the authors of the original Gaussian Splatting paper.

Let me explain. The position learning rate controls how much your Gaussians move in the scene at each training iteration: the larger the learning rate, the harder it is for the Gaussians to do very small moves. The smaller the learning rate, the harder it is for the Gaussians to make large moves.

Let's say you have two instances of the same datasets, with the exact same SfM point cloud and same camera poses. However, you multiply by 2 the scale of one of these two identical scenes (so you multiply by 2 the positions of the cameras, and the coordinates of the points in the SfM point cloud). If you multiply the learning rate of the larger scene by 2, then the two scenes will be optimized in the same way. On the contrary, if you do not multiply the position learning rate by 2, then the two scenes will behave differently, and the learning rate may be too small in the larger scene.

Consequently, multiplying the learning rate by the scale of the scene makes the method much more robust, as it does not depend on the scale of the SfM (so if you use an SfM method that outputs coordinates between -1 and 1, you'll get the same results as another SfM method that outputs coordinates between -10 and 10).

*OK, now we are convinced that taking `position_lr_init spatial_lr_scaleas the learning rate is a good idea. But what about the value ofposition_lr_init`?**

Similarly, the value of position_lr_init has an influence on the optimization: the larger its value, the harder it is for Gaussians to fit very small structures. The default value of position_lr_init is optimized to produce good reconstructions on standard scenes, with reasonable size. Let's say that, in a standard scene, the size of an object is between 1/100th and 1/10th the size of the full scene. Then, the default value of position_lr_init is optimized to allow Gaussians to fit elements with a size between 1/100th and 1/10th the size of the full scene.

On the contrary, in a very large scene like a city district, the typical size of an object may be 1/1000th of the size of the full scene. Then, if you use the default value of position_lr_init, it is impossible for Gaussians to fit correctly to objects of your scene. Therefore, lowering the value of position_lr_init greatly improves the quality of the reconstruction.

Question: So why don't we always use a very small learning rate, so that details can be reconstructed in any scene?

Using a small learning rate enforces Gaussians to make small moves and to be smaller, so you will generally need more Gaussians to reconstruct your scene. For standard scenes, you will end up with too many unnecessary Gaussians, which is very inefficient for memory. On the contrary, when reconstructing a large scene like a city disctrict, you generally have much more SfM points, and want much more gaussians in the scene.

I hope this message helps you! Thanks!

JuliusQv commented 11 months ago

Thank you for your detailed explanation. I have figured out this problem.

Anttwo / SuGaR

Confusion about the learning rate in sugar model #30