dvlab-research / VoxelNeXt

VoxelNeXt: Fully Sparse VoxelNet for 3D Object Detection and Tracking (CVPR 2023)
https://arxiv.org/abs/2303.11301
Apache License 2.0
733 stars 64 forks source link

large model training loss nan #19

Closed JayYangSS closed 1 year ago

JayYangSS commented 1 year ago

Hi, thanks for sharing the code. I would like to know how many GPUs do you use to train the large model on the waymo data set? Using tools/cfgs/waymo_models/voxelnext_ioubranch_large.yaml, I use 3090 GPUx4 with mixed-precision training, but the model has a problem of loss nan at the beginning, I'm thinking is it because my batch-size is too small? or the learning rate 0.003 is too large? or beacause of use-amp?( I have tried to decrease the learning rate, close the mix-precision training, but it didn't work) so could please provide a pretrain-model if it is possible?

yukang2017 commented 1 year ago

Hi,

I think the reason might be from mixed precision. In addition, please contact me via-email. I will send you the checkpoint.

Regards, Yukang Chen