In the first stage of VoxFormer, I roughly visualized the output of a (128 128 16) query and it looked like the above image. The paper mentioned they use LMSCNet for Depth correction, but in reality, can we say that they used a higher scale, complete voxel map obtained from the occupancy map ground truth based on the depth pseudo-lidar as the query?
In the first stage of VoxFormer, I roughly visualized the output of a (128 128 16) query and it looked like the above image. The paper mentioned they use LMSCNet for Depth correction, but in reality, can we say that they used a higher scale, complete voxel map obtained from the occupancy map ground truth based on the depth pseudo-lidar as the query?