Anttwo / SuGaR

[CVPR 2024] Official PyTorch implementation of SuGaR: Surface-Aligned Gaussian Splatting for Efficient 3D Mesh Reconstruction and High-Quality Mesh Rendering
https://anttwo.github.io/sugar/
Other
2.31k stars 175 forks source link

The loss is nan when refine reconstruction #90

Closed QY0911 closed 9 months ago

QY0911 commented 10 months ago

During the refine reconstruction, I encountered a problem with a loss of nan (the process in the coarse reconstruction is normal)

Loading mesh to bind to: ./output/coarse_mesh/endonerf-data/sugarmesh_3Dgs7000_sdfestim02_sdfnorm02_level03_decim1000000.ply... Mesh to bind to loaded. Binding radiance cloud to surface mesh...

SuGaR model has been initialized. SuGaR() Number of parameters: 109040169 Checkpoints will be saved in ./output/refined/endonerf-data/sugarfine_3Dgs7000_sdfestim02_sdfnorm02_level03_decim1000000_normalconsistency01_gaussper face1/

Model parameters: _surface_mesh_faces torch.Size([1999752, 3]) False surface_mesh_thickness torch.Size([]) False _points torch.Size([1017771, 3]) True all_densities torch.Size([1999752, 1]) True _scales torch.Size([1999752, 2]) Truerefine reconstruction _quaternionsrefine reconstruction torch.Size([1999752, 2]) True _sh_coordinates_dc torch.Size([1999752, 1, 3]) True _sh_coordinates_rest torch.Size([1999752, 15, 3]) True Using as spatial_lr_scale: 348046223.36 with bbox_radius: 34804622336.0 and n_vertices_in_fg: 1000000 Optimizer initialized. Optimization parameters: OptimizationParams( iterations=15000, position_lr_init=0.00016, position_lr_final=1.6e-06, position_lr_delay_mult=0.01, position_lr_max_steps=30000, feature_lr=0.0025, opacity_lr=0.05, scaling_lr=0.005, rotation_lr=0.001, ) Optimizable parameters: points 55687.3957376 sh_coordinates_dc 0.0025 sh_coordinates_rest 0.000125 all_densities 0.05 scales 0.005 quaternions 0.001 Using loss function: l1+dssim


Iteration: 1 loss: 0.390292 [ 1/15000] computed in 0.002570327123006185 minutes. ------Stats----- ---Min, Max, Mean, Std Points: nan nan nan nan Scaling factors: 9004.994140625 104001824.0 12808996.0 13129376.0 Quaternions: nan nan nan nan Sh coordinates dc: -1.4712525606155396 1.6890442371368408 -0.021527336910367012 0.5082155466079712 Sh coordinates rest: 0.0 0.0 0.0 0.0 Opacities: 0.09558913111686707 0.10459084808826447 0.10090889781713486 0.0028586185071617365


Iteration: 50 loss: nan [ 50/15000] computed in 0.10855583349863689 minutes. ------Stats----- ---Min, Max, Mean, Std Points: nan nan nan nan Scaling factors: 9004.994140625 106616088.0 12910966.0 13285273.0 Quaternions: nan nan nan nan Sh coordinates dc: -1.4712525606155396 1.6890442371368408 -0.01923723705112934 0.5087924599647522 Sh coordinates rest: 0.0 0.0 0.0 0.0 Opacities: 0.06902769207954407 0.6108894944190979 0.10668428987264633 0.018140438944101334


Iteration: 100 loss: nan [ 100/15000] computed in 0.1061731735865275 minutes. ------Stats----- ---Min, Max, Mean, Std Points: nan nan nan nan Scaling factors: 9004.994140625 106637248.0 12911991.0 13286570.0 Quaternions: nan nan nan nan Sh coordinates dc: -1.4712525606155396 1.6890442371368408 -0.01920899748802185 0.5088003277778625 Sh coordinates rest: 0.0 0.0 0.0 0.0 Opacities: 0.06884335726499557 0.9247159361839294 0.10677241533994675 0.019122010096907616


Iteration: 150 loss: nan [ 150/15000] computed in 0.10701924959818522 minutes. ------Stats----- ---Min, Max, Mean, Std Points: nan nan nan nan Scaling factors: 9004.994140625 106637248.0 12912225.0 13286637.0 Quaternions: nan nan nan nan Sh coordinates dc: -1.4712525606155396 1.6890442371368408 -0.019203422591090202 0.5088027119636536 Sh coordinates rest: 0.0 0.0 0.0 0.0 Opacities: 0.06884214282035828 0.9834164381027222 0.10678516328334808 0.019477466121315956

Anttwo commented 10 months ago

Hello @QY0911,

It looks like your spatial_lr_scale is equal to 348046223.36, which is insanely huge! I think this extremely large learning rate explains your nan values.

During refinement, the value of spatial_lr_scale (which is a scaling factor for the learning rate of 3D Gaussians' positions and scales) depends on the geometry of the scene. Specifically, it depends on the spatial extent of the foreground bounding box, which is equal to the bounding box of the camera trajectory by default (but the user is free to provide a custom bounding box).

May I ask, what does your scene look like? Did you compute the camera poses with the colmap script provided in the gaussian_splatting directory (convert.py)? Do you know if your scene is very large (i.e., with coordinates that can have very large values)? Did you use the default hyperparameters, of did you use some custom arguments?

QY0911 commented 10 months ago

Really thank for your reply! I am using a set of my own dataset (about 60 images), firstly I have done the dense reconstruction with colmap, then I have trained it, I have used the default parameters in colmap for all the parameters in my processing.And I 'm not sure wether my scene is too large, how can I best continue the training in this case?