PJLab-ADG / neuralsim

neuralsim: 3D surface reconstruction and simulation based on 3D neural rendering.
MIT License
590 stars 31 forks source link

Loss is NaN when training StreetSurf #4

Closed filick closed 1 year ago

filick commented 1 year ago

Hi,

I tried to train the StreetSurf model on the Waymo-100613 scene but got a NaN loss at the first step. I just downloaded the processed data pack and tried several configs under code_single/configs/waymo/streetsurf/, nothing is modified expect file paths, but all experiments failed the same. I give some screanshots of logs below. I print the detailed loss dict, it seems the rgb loss and mask loss is NaN.

exp using nomask_withlidar.230814.yaml: image

exp using withmask_withlidar.230814.yaml: image

My environment is Pytorch 2.0.1, cuda 11.8.

Can you take a look? Thanks.

ventusff commented 1 year ago

:thinking: I have never tried on pytorch 2 before. Taking a look now

ventusff commented 1 year ago

Hi @filick It seems that it's because pytorch 2 does not like it when lr gets to exactly zero. This could happen in our current warmup scheduler design. It should be fixed now (sorry that i accidentally closed this issue). You can git pull to update to the latest then git submodule update --init --recursive to also update the submodule nr3d_lib. Let me know if its fixed :)

filick commented 1 year ago

Yes, my training looks good now.Thanks @ventusff , you are fast!