Custom Dataset Evaluation

jaidevshriram commented 3 years ago

Hey!

I'm trying to run the OccAnt model on my dataset, with the aim of generating the top-view maps - not concerned with the RL pipeline or pose estimation at the moment. How can I go about loading a pretrained model and run just the mapper module. It seems that the RL pipeline is tightly coupled with this, so I'm facing some trouble using my dataset.

Here's what I've tried so far:

config = get_config("ppo_navigation_evaluate.yaml", None)
trainer_init = OccAntExpTrainer(config)

ans_cfg = config['RL']['ANS']
ppo_cfg = config['RL']['PPO']
occ_cfg = ans_cfg.OCCUPANCY_ANTICIPATOR
mapper_cfg = ans_cfg.MAPPER

# Create occupancy anticipation model
occupancy_model = OccupancyAnticipator(occ_cfg)
occupancy_model = OccupancyAnticipationWrapper(
    occupancy_model, mapper_cfg.map_size, (128, 128)
)

# Create ANS model
ans_net = ActiveNeuralSLAMExplorer(ans_cfg, occupancy_model)
mapper = ans_net.mapper

train_dict = trainer_init.load_checkpoint("models/eccv/ckpt.6.pth")
new_dict = {}
for key in train_dict['mapper_state_dict'].keys():
    if 'projection_unit' in key:
        new_dict['.'.join(key.split('.')[2:])] = train_dict['mapper_state_dict'][key]

mapper.projection_unit.load_state_dict(new_dict)

However, the mapper module seems to require inputs that come from the Habitat Sensors used. My dataset is just a collection of RGB+D images.

Ideally, I can just isolate the OccAntRGBD module, use the pretrained weights, and do a simple forward pass with RGB+D input.

Would appreciate any help on this front.

Thanks!

srama2512 commented 3 years ago

Hi @jaidevshriram,

The model currently depends on having RGB-D + odometry inputs. I don't see a way to get around having no pose inputs whatsoever. The pre-trained models are trained to handle a specific noise model in the odometer sensor (see code here). Assuming that you have perfect pose estimates or noise similar to the above model, you would just have to ensure that the inputs are in the expected format to the mapper (see here). This specific function might offer the best insights for what you're trying to achieve. The interface to the mapper module was designed to be easy to understand. In your case, you would have to input the following variables to update the map at time t:

            "rgb_at_t_1" - RGB image at t-1
            "depth_at_t_1" - Depth image at t-1
            "ego_map_gt_at_t_1" - Egocentric local map at t-1
            "pose_at_t_1" - Estimated agent world pose at t-1 (relative to starting position) -- this may have noise in it
            "pose_hat_at_t_1" - Agent corrected pose at t-1 --- the agent corrects for the noise (if trained to do so)
            "map_at_t_1" - Full world map from 0 to t-1 (agent starts at the center of this map facing north)
            "rgb_at_t" - RGB image at t
            "depth_at_t" - Depth image at t
            "ego_map_gt_at_t" - Egocentric local map at t
            "pose_at_t" - Estimated agent world pose at t --- this may have noise in it

This should be a good starting point. I hope you can develop the code based on this. Feel free to bring up any specific questions you have here.

jaidevshriram commented 3 years ago

Thanks! I think I got it working now. I was interested in using just the OccAnt model, not the mapper, so using just RGB, Depth and ego map as inputs worked.

srama2512 commented 3 years ago

Glad to hear that @jaidevshriram . I'm closing this for now. Please feel free to open it in case you have further questions.

facebookresearch / OccupancyAnticipation

Custom Dataset Evaluation #39