Open SeaBird-Go opened 1 year ago
1 The backbone and image size might not have effects on the pretraining performances. The Occ label must be obtained by fusing multiple frame point clouds, and a single frame point cloud is too sparse to improve. 2 BEVDet-OCC uses the pre-trained model from BEVDet. You'd better also use this pre-trained model for initialization when training binary occupancy prediction, for the sake of fairness. 3 Focal loss is better than cross-entropy. 4 Semantic Occ is more conducive to pre-training.
Thanks for your detailed explanation. And sorry to reply here since so busy these days.
For the pertaining case, I also initialized the backbone with the pre-trained ResNet-50 model, and then pretraining the BEVDet-OCC with the binary occupancy prediction. After that, I finetune the semantic occupancy prediction. I obtained the mIOU is 34.79 in this case.
The performance has been improved very slightly. I'm not sure what the problem.
Thanks for your detailed explanation. And sorry to reply here since so busy these days.
- I understand that the occupancy GT should be obtained from the multiple sweeps point clouds. Since the BEVDet-OCC use the semantic occupancy GT from the CVPR 2023 occupancy challenge. So I just let the voxels not belong to free category be the occupied voxels, in this manner, I obtained the binary occupancy GT.
- I know the BEVDet-OCC uses the pre-trained BEVDet model as the initialization. So for the sake of fairness, I just loaded the pre-trained ResNet-50 model to initialize the backbone, and then finetuned the semantic occupancy prediction. I obtained the mIOU is 34.01 in this case.
For the pertaining case, I also initialized the backbone with the pre-trained ResNet-50 model, and then pretraining the BEVDet-OCC with the binary occupancy prediction. After that, I finetune the semantic occupancy prediction. I obtained the mIOU is 34.79 in this case.
The performance has been improved very slightly. I'm not sure what the problem.
This result is normal. Oc-BEV does not improve as much in scene completion tasks as 3D detection, and the first version of my paper is similar to this result.
@chaytonmin So what are the key changes you have made to achieve the 3.14% improvements in your latest version paper?
@chaytonmin So what are the key changes you have made to achieve the 3.14% improvements in your latest version paper?
TTA
Hi, thanks for sharing this work.
When I pretrained the BEVDet-OCC by predicting the binary occupancy and then finetune this model, the results seem improved almost nothing, when compared with finetune the model with the pretraind ResNet-50 model.
I don't know why. And I observed that you conduct the occupancy prediction experiments on the BEVStereo with a stronger backbone and larger image size (256x704 in my setting), so I wonder whether the backbone and image size could have fatal effects on the pretraining performances.