devendrachaplot / Object-Goal-Navigation

Pytorch code for NeurIPS-20 Paper "Object Goal Navigation using Goal-Oriented Semantic Exploration"
https://devendrachaplot.github.io/projects/semantic-exploration
MIT License
327 stars 60 forks source link

Tips for extracting Semantic Mapping module as a baseline #7

Closed ugurbolat closed 3 years ago

ugurbolat commented 3 years ago

Hi,

Thanks for the interesting work.

I would like to evaluate the Semantic Mapping module without the action planning/navigation part since the dataset I want to test already provides navigation.

As far as I've explored the project, there are two key related modules: (1) Semantic_Mapping class that does the local map and agent pose predictions and (2) Sem_Exp_Env_Agent class that uses SemanticPredMaskRCNN pretrained detectron2 for object detection and segmentation. I feel like the latter - (2) module is kind of tightly coupled to the habitat environment-related part.

What would be your tips/ideas for extracting the Semantic Mapping module from the codebase? Or do you think that there won't be any straightforward solution?

Would appreciate any kind of feedback.

devendrachaplot commented 3 years ago

The Semantic Mapping pytorch model class contains the code for semantic mapping: https://github.com/devendrachaplot/Object-Goal-Navigation/blob/master/model.py#L132

If you provide the input in the correct format, it should be output the semantic map.

If you do not have the semantic segmentation predictions, you just need to pass RGB observation through a pretrained Mask-RCNN to get the predictions and stack it with RGBD input, see: https://github.com/devendrachaplot/Object-Goal-Navigation/blob/master/agents/sem_exp.py#L331 https://github.com/devendrachaplot/Object-Goal-Navigation/blob/master/agents/sem_exp.py#L310

ugurbolat commented 3 years ago

Thanks for the leads. Let me check it out.

ugurbolat commented 3 years ago

Hi again,

After digging into the code base and paper, I've realized that there are no learning parameters in Semantic Map model/class, and there are basically only projection transformation functions to get the bird eye's view (no Denoising Network used like you mentioned in the repo). So, in theory, I could get any instance segmentation predictions (in your case MaskRCNN) and project them along with the RGBD observation into a semantic map.

To verify my understanding: can we say that semantic map part is not learned?

BTW, where do you set the number of semantic categories for MaskRCNN, setting --num_sem_categories argument doesn't seem to have an effect. I assume you need map object ID differences between different datasets.

EDIT: Unfortunately, extracting Semantic Mapping module is not that as straight-forward as it seemed and there are some issues with updating the map when agent is only rotating (without moving) so I'll close the issue.