Thinklab-SJTU / Bench2DriveZoo

BEVFormer, UniAD, VAD in Closed-Loop CARLA Evaluation with World Model RL Expert Think2Drive
Other
161 stars 12 forks source link

Cannot reproduce the open-loop evaluation results of BEVFormer-Base on Mini Dataset #40

Closed haibao-yu closed 1 month ago

haibao-yu commented 2 months ago

Thanks for your excellent work. However, there is a large gap between the the reproduced results and the report results of BEVFormer-Base on Mini Dataset.

The report results of BEVFormer-Base is mAP: 0.63. However, my reproduced result is mAP: 0.2094 as follows:

mAP: 0.2094
mATE: 0.5238
mASE: 0.2640
mAOE: 0.2195
mAVE: 1.2025
NDS: 0.3377 Per-class results:
Object Class AP ATE ASE AOE AVE
car 0.130 0.236 0.057 0.022 2.397
van 0.000 1.000 1.000 1.000 1.000
truck 0.213 0.456 0.226 0.051 5.397
bicycle 0.391 0.503 0.080 0.115 0.230 traffic_sign 0.237 0.416 0.045 0.052 0.011
traffic_cone 0.347 0.745 0.283 0.220 0.022
traffic_light 0.067 0.483 0.351 0.126 0.023
pedestrian 0.289 0.350 0.070 0.171 0.541

Command Used

./adzoo/uniad/uniad_dist_eval.sh ./adzoo/uniad/configs/stage1_track_map/base_track_map_b2d.py ./ckpts/bevformer_base_b2d.pth 1

Dataset: your provided Mini Dataset (10 clips).

Can you provide any suggestions? Looking forward your reply.

zhiyuanzzz commented 2 months ago

@haibao-yu The results we report is measured on the validation set. When selecting the validation set, we tried to ensure an even distribution of scenes and weather conditions, with all categories of objects being represented. However, the mini set does not have these characteristics, and due to the significant differences in data distribution, the results may vary greatly. Additionally, please use the command ./adzoo/bevformer/dist_test.sh ./adzoo/bevformer/configs/bevformer/bevformer_base_b2d.py ./ckpts/bevformer_base_b2d.pth 1 to evaluate.

haibao-yu commented 2 months ago

@zhiyuanzzz Thank you for your reply. I followed your instructions and used the command provided to evaluate the BEVFormer-Base model. The results were quite similar to the reported metrics (mAP: 0.5604 vs. mAP: 0.63), which suggests that the difference in configuration is responsible for the significant result gap.

However, there's another critical issue with the UniAD training. You trained UniAD using either the base_track_map_b2d.py or base_e2e_b2d.py scripts while using the bevformer_base_b2d.pth as the pre-trained perception model. When we evaluate base_track_map_b2d.py/base_e2e_b2d.py with bevformer_base_b2d.pth, there is a severe performance drop. This could have a significant impact on the overall UniAD training process.

config: ./adzoo/bevformer/configs/bevformer/bevformer_base_b2d.py, checkpoint: bevformer_base_b2d.pth, mAP: 0.56, AP-Car: 0.739

config: adzoo/uniad/configs/stage1_track_map/base_track_map_b2d.py, checkpoint: bevformer_base_b2d.pth, mAP: 0.137, AP-Car: 0.123

config: adzoo/uniad/configs/stage2_e2e/base_e2e_b2d.py, checkpoint: bevformer_base_b2d.pth, mAP: 0.21, AP-Car: 0.131

What are your thoughts on this issue?

zhiyuanzzz commented 2 months ago

@haibao-yu The UniADTrack in UniAD uses more modules to predict boxes and scores than original BEVformer. So you can not directly load weights of BEVformer to evaluate UniAD on detection task.

haibao-yu commented 2 months ago

@zhiyuanzzz Thank you for the information. However, the evaluation results of UniAD using the provided uniad_base_b2d.pth model show an mAP of 0.1372. This suggests that the trained UniAD model may have poor detection performance. Could you provide further insights into this discrepancy?

jiaxiaosong1002 commented 2 months ago

@haibao-yu It may be related to the tuning of stage1 in UniAD. We strictly follow UniAD's official code for training. You may adjust by your need to solve the issue.