large model training loss nan

Hi, thanks for sharing the code. I would like to know how many GPUs do you use to train the large model on the waymo data set? Using tools/cfgs/waymo_models/voxelnext_ioubranch_large.yaml, I use 3090 GPUx4 with mixed-precision training, but the model has a problem of loss nan at the beginning, I'm thinking is it because my batch-size is too small? or the learning rate 0.003 is too large? or beacause of use-amp?( I have tried to decrease the learning rate, close the mix-precision training, but it didn't work) so could please provide a pretrain-model if it is possible?

dvlab-research / VoxelNeXt

large model training loss nan #19