Closed swaggywilliam closed 1 year ago
Furthermore, another question: how is the decomposition of the red voxels performed?
Thanks for your interest! The point cloud is the raw signal collected by the LiDAR sensor, thus we do not need to manually generate it. MDU only generates camera virtual points, where LiDAR signals only provide ground-truth depths for reference. With the original LiDAR point clouds and camera virtual points generated by MDU, we can voxelize the 3D space into several voxels, and these voxels can be grouped into LiDAR-only, camera-only and LiDAR-camera ones according to their included modalities. Besides, for red voxels, since the LiDAR and camera features have not been fused yet after voxelization, the decomposition process just extracts their respective features for further process. Hope it helps.
Do you mean that, the voxels containing both original LiDAR point clouds and virtual point clouds generated by MDU are considered as LiDAR-camera combined. On the other hand, the voxels containing only virtual point clouds generated by MDU are considered as camera-only?
Yes. And voxels with only LiDAR points are considered as LiDAR-only.
Yes. And voxels with only LiDAR points are considered as LiDAR-only.
thanks!
Hello, I am confused about the GMA-Conv module in the MSMDFusion paper. Before this module, you divided the existing 3D space voxels into three categories: LiDAR-only (blue voxels), Camera-only (yellow voxels), and LiDAR and Camera combined (red voxels). From Figure 2, it appears that the yellow voxels are generated by the MDU module, but shouldn't the MDU module produce LiDAR and Camera combined voxels(red voxels)? In that case, where do the red voxels come from? If the camera does not have depth information provided by LiDAR, how does the pixel become a camera-only voxel? Perhaps my understanding is not accurate, so I would appreciate your guidance!