Loss NaN error when training iteration=15020 and tiny-cuda-nn is enabled

city-super / GSDF

[NeurIPS 2024]GSDF: 3DGS Meets SDF for Improved Rendering and Reconstruction

Other

295 stars 11 forks source link

Loss NaN error when training iteration=15020 and tiny-cuda-nn is enabled #10

Closed dongliangcao closed 3 weeks ago

dongliangcao commented 1 month ago

Thanks a lot for the bug fix of using tiny-cuda-nn. However, when I enable tiny-cuda-nn training, I always encounter the same issue when the training close to 15020 iterations, the loss will go to NaN and the training is just broken. Do you have any insight about this? Thanks again.

MulinYu commented 3 weeks ago

Maybe set self.use_tcnn = False, We are working to accelerate the training speed and will release the update soon.

df34 commented 3 weeks ago

In fact, when I set self.use_tcnn = False, still at 15,010 times, the loss becomes NaN. May I ask what to do in this case?