HorizonRobotics / Sparse4D

MIT License
298 stars 26 forks source link

cls and Dn loss nan #50

Closed Gaondong closed 4 months ago

Gaondong commented 4 months ago

Thanks for your excellent work. I closed the depth estimation branch but encountered cls loss nan and dn loss nan in 38000+ iters. image optimizer = dict( type="AdamW", lr=2e-4, weight_decay=0.001, paramwise_cfg=dict( custom_keys={ "img_backbone": dict(lr_mult=0.5), } ), )

linxuewu commented 4 months ago

Reducing the batch size or disabling the depth auxiliary task both require reducing the learning rate. You need to further decrease the learning rate.

Gaondong commented 4 months ago

Reducing the batch size or disabling the depth auxiliary task both require reducing the learning rate. You need to further decrease the learning rate.

My batchsize is 64, the learningrate is set to 4e-4 at first, it is reduced to 2e-4, and the NAN still appears, do I need to adjust it to 1e-4

linxuewu commented 4 months ago

Is the dataset you are using the nuScenes dataset?

Gaondong commented 4 months ago

Is the dataset you are using the nuScenes dataset?

It's not nuscenes, it's custom dataset.

linxuewu commented 4 months ago

I can't determine the cause of the problem, because there are too many influencing factors.

Gaondong commented 4 months ago

I'll try to reduce the learning rate. Thanks.

------------------ Original ------------------ From: linxuewu @.> Date: Fri,May 24,2024 8:02 PM To: HorizonRobotics/Sparse4D @.> Cc: jiandong.gao @.>, Author @.> Subject: Re: [HorizonRobotics/Sparse4D] cls and Dn loss nan (Issue #50)

zengstrive commented 1 month ago

Hello, how did you solve this problem.

Gaondong commented 1 month ago

Hello, how did you solve this problem.

learningrate 减低到1e-4看起来能收敛,如果还有问题,可以尝试再减小。

zengstrive commented 1 month ago

Hello, how did you solve this problem.

learningrate 减低到1e-4看起来能收敛,如果还有问题,可以尝试再减小。

谢谢回复!6e-5是一个不错的选择,模型已经收敛了。