Some problems of nusc-depth training code

LinShan-Bin / OccNeRF

Code of "OccNeRF: Advancing 3D Occupancy Prediction in LiDAR-Free Environments".

Apache License 2.0

317 stars 18 forks source link

Some problems of nusc-depth training code #5

Open eliliu2233 opened 10 months ago

eliliu2233 commented 10 months ago

Thanks for your great work! I have run the training code for depth estimation and found following two problems:

Sometimes get nan while using fp16.
The loss function does not decrease while training.

Could you please give me some advice about these problems? I run the nusc-depth training code in 4 GPUS with the same setting of the release code (auxiliary_frame=True and use_fp16=True).

LinShan-Bin commented 10 months ago

Thanks for your feedback! Since we didn't observe this particular issue in our initial experiments, could you kindly provide us with additional details (the training log)? We are actively working to replicate this error and are committed to enhancing the stability of the fp16 training process.
It's a normal phenomenon when using the photometric loss. But the network is actually learning and you can wait for the result.

eliliu2233 commented 10 months ago

Thanks for your feedback! Since we didn't observe this particular issue in our initial experiments, could you kindly provide us with additional details (the training log)? We are actively working to replicate this error and are committed to enhancing the stability of the fp16 training process.

It's a normal phenomenon when using the photometric loss. But the network is actually learning and you can wait for the result.

Thanks for your reply. I found that nan always happen while calculating rendering weights, so I force the tensor type in weights calculation to fp32 and solve this problem.