AIR-THU / DAIR-V2X

Apache License 2.0
424 stars 65 forks source link

multimodal fusion problem #72

Closed wanghangege closed 7 months ago

wanghangege commented 11 months ago

Hello, can dairv2x-i perform multimodal fusion? The visual labels and LiDAR labels of this dataset are separate, and the projection parameters do not seem to combine the two well. Does this mean that this dataset cannot perform multimodal fusion?

CleanSeaSalt commented 8 months ago

I also discovered this problem. The labels of LiDAR seem to be inconsistent with those of Camera. So which label should I refer to when doing multi-modal fusion? image Not only that, the label box is sometimes biased when projected onto the image image image

haibao-yu commented 7 months ago

Hello, can dairv2x-i perform multimodal fusion? The visual labels and LiDAR labels of this dataset are separate, and the projection parameters do not seem to combine the two well. Does this mean that this dataset cannot perform multimodal fusion?

Thanks for your question. You can convert the dataset into KITTI format as https://github.com/AIR-THU/DAIR-V2X/blob/main/docs/data_converter.md. Then you can use mmdetection3d to train multimodal model on this converted dataset.

haibao-yu commented 7 months ago

I also discovered this problem. The labels of LiDAR seem to be inconsistent with those of Camera. So which label should I refer to when doing multi-modal fusion? image Not only that, the label box is sometimes biased when projected onto the image image image

Yes, there is no strict synchronization between the camera and LiDAR. Consequently, we supply two types of labels, each corresponding to either the camera or LiDAR data. For effective multimodal fusion detection, we recommend utilizing LiDAR as the primary sensor, with its corresponding label serving as the ground truth.