Nightmare-n / UniPAD

UniPAD: A Universal Pre-training Paradigm for Autonomous Driving (CVPR 2024)
https://arxiv.org/abs/2310.08370
Apache License 2.0
167 stars 7 forks source link

Question about FP32 Training #16

Open dingli-dean opened 5 months ago

dingli-dean commented 5 months ago

Hi team, thanks for releasing this exceptional work. In the released log (abl_uvtr_cam_vs0.1_finetune.log), a pretrained model can be observed in Line 1236

2023-11-18 18:34:11,643 - mmdet - INFO - load checkpoint from work_dirs/convnext_s_pretrain_enorm_nuscenes_fp32_d32_x1_nofar/epoch_12.pth

Does this indicate that the UniPAD model is pretrained with FP32? Also, we notice that the default pretrain setting in this repo is FP16 (fp16_enabled=True). Could you share the comparison between FP32 pretraining and FP16 pretraining?

Thanks again for your attention, and look forward to your reply.

Nightmare-n commented 5 months ago

Thanks for your interest. fp32 means that we use fp32 for the render head, which is a default setting in the config (here)

dingli-dean commented 5 months ago

Thanks for your timely reply.

image

I try to reproduce the ablation results of Table 3 with camera-only setting. The config of pretraining stage is projects/configs/unipad/uvtr_cam_vs0.1_pretrain.py, and the config of finetuning stage is projects/configs/unipad_abl/abl_uvtr_cam_vs0.1_finetune.py. The results are shown as follows:

image

The top-2 lines show that results of 'train-from-scratch' setting can be reproduced successfully. However, the results of 'UniPAD Pretrain' cannot be reproduced well. The Line 3 and Line 4 show the results of Table 3 in paper and the results of released model in google cloud respectively, and the Line 5 shows our result which uses uvtr_cam_vs0.1_pretrain.py and abl_uvtr_cam_vs0.1_finetune.py. Obviously, our results are lower than that in Line 3 and Line 4. It is worth noting that 'fp16_enabled=True' is defined in uvtr_cam_vs0.1_pretrain.py, which means fp16 is utilized for the render head. Unsimilarly, 'fp16_enabled=False' is defined in uvtr_cam_vs0.075_pretrain.py.

All in all, if I want to reproduce the ablation results of Table 3, fp32 should be used in uvtr_cam_vs0.1_pretrain.py, right?

BTW, I try to load the released pretrained model in google cloud (uvtr_cam_vs0.1_pretrain.pth) and fintune on the downstream detection task. The corresponding results are shown in Line 6, which still has a gap when comparing with results of Table 3.

Can you give some advices for reproducing?

Thanks again for your attention, and look forward to your reply.

Nightmare-n commented 5 months ago

Hi, the result in the paper uses the config of uvtr_cam_vs0.1_pretrain.py (indeed use fp16 but I forget why to name it fp32). Do you use 4 A100 GPUs to reproduce the results?

dingli-dean commented 5 months ago

Hi, the result in the paper uses the config of uvtr_cam_vs0.1_pretrain.py (indeed use fp16 but I forget why to name it fp32). Do you use 4 A100 GPUs to reproduce the results?

Literally, I try to set fp32 for render head, and the performance is slightly lower than taht of uvtr_cam_vs0.1_pretrain.py. Unfortunately, the platform differs, 8 3090 GPUs are used for reproducing.

Nightmare-n commented 5 months ago

Hi, please try retraining with 4 GPUs as the batch sizes may be different and the learning rate needs to be adjusted.