pvrcnn performance on nuscenes dataset

CVMI-Lab / ST3D

(CVPR 2021 & T-PAMI 2022) ST3D: Self-training for Unsupervised Domain Adaptation on 3D Object Detection & ST3D++: Denoised Self-training for Unsupervised Domain Adaptation on 3D Object Detection

Apache License 2.0

297 stars 46 forks source link

I have several questions about the ST3D model during my practice，thanks！

I change the config file to support multi-categories detection , adapt from Nuscenes to kitti dataset, PVRCNN trained on nuscenes cannot detection cyclist well? the mAP after 50 epoch is below: car mean AP: 0.717315570146554 pedestrian mean AP: 0.6353227795955906 motorcycle mean AP: 0.0 truck mean AP: 0.5661849787683544 bus mean AP: 0.8888888888888891 bicycle mean AP: 0.0 construction_vehicle mean AP: 0.0 trailer mean AP: 0.0 I see that many nuscenes SOTA models such as center-point use multiframes ( max_sweep=10 ) to fusion the lidar, but the provided pvrcnn_old_anchor_ros.yaml set max_sweep=1 ？
The meaning of 'anchor_bottom_heights' of different categories? in waymmo dataset this param is set to [0], but for nuscenes different categories have much differences
Have you tested the MEMORY_ENSEMBLE in pesudo-label self-training process, i found the default setting is Enable: false.
Could you share your work at ST3D++ that combine the source domain dataset to co-training the model? thank you!

Hello,

You need to do class mapping for nus, since it only have bicycle and motocycle, but no cyclist class. As for the number of sweep, when we build ST3D, we want this can be more uniform setting between different source and target domains. For example, kitti do not have sweep information. Furthermore, the multiframes -> single frames result is reported in ST3D++. Last, we have try to adapt waymo pretrained models to nusences with max_sweep=5, but the adaptation results even decrease.
Yes. From my perspective, this is not important. Just keep same between source and target model will be enough. You can change KITTI cfg with different anchor height and train from scratch. I think the performance will be similar.
I am not sure. Since most of our experiment is built on PVRCNN. Maybe this config just follow PVRCNN waymo -> kitti, which doesn't updating pseudo label and also doesn't need memory ensemble. From my experience, memory ensemble will not make the result become pooler. Also, here is an experience: memory ensemble is important for pedestrian and cyclist, which is found in ST3D++.
Actually, making ST3D++ code avaliable is not hard. However, I need to spend a lot of time to check the reproducibility for those configs, which is time-consuming for me. I might make it avaible, but still need some time. What's more, I think making ST3D++ work for more latest pytorch and cuda version is important, I will work on this first.

CVMI-Lab / ST3D

pvrcnn performance on nuscenes dataset #36