Inspect Semantic Mapping pipeline in Paper

Semantic Mapping

The depth observation is used to compute a point cloud. Each point in the point cloud is associated with the predicted semantic categories.

Do we need to use a depth sensor or can we use output of depth NN?? HabitatNav says it is not dependent on a very very accurate reading.

The semantic categories are predicted using a pretrained Mask RCNN [18] on the RGB observation.

Use a pre-trained COCO model across selected labels. Select these labels.

Each point in the point cloud is then projected in 3D space using differentiable geometric computations to get the voxel representation. The voxel representation is then converted to the semantic map.

Not sure how to do this?

Summing over the height dimension of the voxel representation for all obstacles, all cells, and each category gives different channels of the projected semantic map.

The projected semantic map is then passed through a denoising neural network to get the final semantic map prediction. The map is aggregated over time using spatial transformations and channel-wise pooling as described in [10].

The Semantic Mapping module is trained using supervised learning with cross-entropy loss on the semantic segmentation as well as semantic map prediction.

The geometric projection is implemented using differentiable operations such that the loss on the semantic map prediction can be backpropagated through the entire module if desired

gauravkuppa / DroneObjectNav

Inspect Semantic Mapping pipeline in Paper #4

Semantic Mapping