Closed Sangmin-Bak closed 1 year ago
I am currently working on this paper, so i will try to explain what i have understood : 1- First of all, they extract the depth map using the mobile stereo net 3D (msnet3D) and project the resulted map points to have a 3D point cloud. Then build a binary grid M_in where if a voxel have just one point from the 3D point cloud estimated, the voxel get the value of 1. So it's sort of 3D occupancy grid.
2- Using this M_in map we can extract the voxels that are occupied, meaning the voxels that are not occluded (voxel queries).
3- the voxel queries that provided in sequences_msnet3d_sweep10, because they are preprocessed from the data (images). You can see that it has been loaded in VoxFormer/projects/mmdet3d_plugin/datasets/semantic_kitti_dataset_stage2.py on load_scan() method. And eventually will be used as input for the model.
Thank @Abde951 for the explanations. M_in map is a voxelized pseudo point cloud used as input to query proposal network (QPN), and sequences_msnet3d_sweep10 is the output of QPN (taking the voxelization of the previous 10 pseudo sweeps as input) and is used as the query proposals in stage-2.
Thank you for your research and I have a few questions for you.
I don't know exactly where the voxel query resides. I am curious about the following points.