LiDAR-only (blue voxels), Camera-only (yellow voxels), and LiDAR and Camera combined (red voxels)

SxJyJay / MSMDFusion

[CVPR 2023] MSMDFusion: Fusing LiDAR and Camera at Multiple Scales with Multi-Depth Seeds for 3D Object Detection

Apache License 2.0

183 stars 11 forks source link

LiDAR-only (blue voxels), Camera-only (yellow voxels), and LiDAR and Camera combined (red voxels) #19

Closed swaggywilliam closed 1 year ago

swaggywilliam commented 1 year ago

Hello, I am confused about the GMA-Conv module in the MSMDFusion paper. Before this module, you divided the existing 3D space voxels into three categories: LiDAR-only (blue voxels), Camera-only (yellow voxels), and LiDAR and Camera combined (red voxels). From Figure 2, it appears that the yellow voxels are generated by the MDU module, but shouldn't the MDU module produce LiDAR and Camera combined voxels（red voxels）? In that case, where do the red voxels come from? If the camera does not have depth information provided by LiDAR, how does the pixel become a camera-only voxel? Perhaps my understanding is not accurate, so I would appreciate your guidance!

swaggywilliam commented 1 year ago

Furthermore, another question: how is the decomposition of the red voxels performed?

SxJyJay commented 1 year ago

Thanks for your interest! The point cloud is the raw signal collected by the LiDAR sensor, thus we do not need to manually generate it. MDU only generates camera virtual points, where LiDAR signals only provide ground-truth depths for reference. With the original LiDAR point clouds and camera virtual points generated by MDU, we can voxelize the 3D space into several voxels, and these voxels can be grouped into LiDAR-only, camera-only and LiDAR-camera ones according to their included modalities. Besides, for red voxels, since the LiDAR and camera features have not been fused yet after voxelization, the decomposition process just extracts their respective features for further process. Hope it helps.

swaggywilliam commented 1 year ago

Do you mean that, the voxels containing both original LiDAR point clouds and virtual point clouds generated by MDU are considered as LiDAR-camera combined. On the other hand, the voxels containing only virtual point clouds generated by MDU are considered as camera-only?

SxJyJay commented 1 year ago

Yes. And voxels with only LiDAR points are considered as LiDAR-only.

swaggywilliam commented 1 year ago

Yes. And voxels with only LiDAR points are considered as LiDAR-only.

thanks!