Open dingli-dean opened 5 months ago
Thanks for your interest. fp32 means that we use fp32 for the render head, which is a default setting in the config (here)
Thanks for your timely reply.
I try to reproduce the ablation results of Table 3 with camera-only setting. The config of pretraining stage is projects/configs/unipad/uvtr_cam_vs0.1_pretrain.py
, and the config of finetuning stage is projects/configs/unipad_abl/abl_uvtr_cam_vs0.1_finetune.py
. The results are shown as follows:
The top-2 lines show that results of 'train-from-scratch' setting can be reproduced successfully. However, the results of 'UniPAD Pretrain' cannot be reproduced well. The Line 3 and Line 4 show the results of Table 3 in paper and the results of released model in google cloud respectively, and the Line 5 shows our result which uses uvtr_cam_vs0.1_pretrain.py and abl_uvtr_cam_vs0.1_finetune.py. Obviously, our results are lower than that in Line 3 and Line 4. It is worth noting that 'fp16_enabled=True' is defined in uvtr_cam_vs0.1_pretrain.py, which means fp16 is utilized for the render head. Unsimilarly, 'fp16_enabled=False' is defined in uvtr_cam_vs0.075_pretrain.py.
All in all, if I want to reproduce the ablation results of Table 3, fp32 should be used in uvtr_cam_vs0.1_pretrain.py, right?
BTW, I try to load the released pretrained model in google cloud (uvtr_cam_vs0.1_pretrain.pth) and fintune on the downstream detection task. The corresponding results are shown in Line 6, which still has a gap when comparing with results of Table 3.
Can you give some advices for reproducing?
Thanks again for your attention, and look forward to your reply.
Hi, the result in the paper uses the config of uvtr_cam_vs0.1_pretrain.py (indeed use fp16 but I forget why to name it fp32). Do you use 4 A100 GPUs to reproduce the results?
Hi, the result in the paper uses the config of uvtr_cam_vs0.1_pretrain.py (indeed use fp16 but I forget why to name it fp32). Do you use 4 A100 GPUs to reproduce the results?
Literally, I try to set fp32 for render head, and the performance is slightly lower than taht of uvtr_cam_vs0.1_pretrain.py. Unfortunately, the platform differs, 8 3090 GPUs are used for reproducing.
Hi, please try retraining with 4 GPUs as the batch sizes may be different and the learning rate needs to be adjusted.
Hi team, thanks for releasing this exceptional work. In the released log (abl_uvtr_cam_vs0.1_finetune.log), a pretrained model can be observed in Line 1236
Does this indicate that the UniPAD model is pretrained with FP32? Also, we notice that the default pretrain setting in this repo is FP16 (fp16_enabled=True). Could you share the comparison between FP32 pretraining and FP16 pretraining?
Thanks again for your attention, and look forward to your reply.