Questions on detection heads of Uni3D

wangyubin04 commented 7 months ago

Hello, thanks for your great work! Have you considered training only one detection head for different datasets, since the detection task for different datasets is basically based on three classes: car, pedestrian, cyclist. Although there are some differences between objects of the same category in different datasets, I am curious if sharing the detection between different datasets will make the performance about the same or even better?

BOBrown commented 7 months ago

@wangyubin04 I understand that your issue is to use the same detector with the same training parameters for different datasets. For example, training the baseline PV-RCNN on waymo but testing on nuScenes. You could refer to Bi3D, CVPR 2023, for such research. The main observations of Bi3D is that, the detection accuracy of the same detector will significantly drop, when tested on different benchmarks or datasets (Please see Table 1 Waymo-KITTI results), where both the detection head and backbone are trained on a single dataset.

BOBrown commented 7 months ago

@wangyubin04 Besides, please refer to the exp results in Tables 3 and 4 in Uni3D, where Voxel-RCNN (w/o P.T.) represents that our detector is trained on the merged datasets (directly merging Waymo and nuScenes datasets). It can be observed serious performance drop, due to the dataset interference such as many differences in 3D outdoor scenes, including LiDAR sensor difference, object size difference, LiDAR installation height difference, etc.

wangyubin04 commented 7 months ago

Thanks for your response! I mean in MDF setting, is it possible to replace multiple dataset-spectific head in Fig.3 in Uni3D directly with one dataset-agnostic head with a designed class mapping for each dataset, while keeping the other modules maintained? If there is a significant performance drop, is it caused by the fact that the inputs before the detection head can vary a lot depending on the dataset?

BOBrown commented 7 months ago

@wangyubin04 Good point! But one concern is that the class mapping you designed may cause fluctuations in training loss. Given that Waymo and nuScenes have inconsistent definitions for the Vehicle ('car' for nuScenes), the selected foreground candidates for samples in Waymo dataset and nuScene dataset may be different. For example, for nuScenes, the loss for the 'car' class is calculated only based on the car in the given points. But for Waymo, the loss for 'Vehicle' class is calculated based the four-wheeled vehicles in the given points, including 'bus', 'van', etc. As a result, a naive class mapping may not work well.

However, the dataset-agnostic head is a more intuitive way for MDF setting, and it can observe where the performance drops come from. So, we propose the dataset-specific detection to tackle the issue of class definition inconsistency between different 3D dataset. If a dataset-agnostic detection head can also solve this issue (class inconsistency), it is also very good for this community. On the other hand, another challenge in the MDF setting is the inconsistent 3D point data distribution, which remains under-explored!

PJLab-ADG / 3DTrans

Questions on detection heads of Uni3D #30