Mapper training during policy training

facebookresearch / OccupancyAnticipation

This repository contains code for our publication "Occupancy Anticipation for Efficient Exploration and Navigation" in ECCV 2020.

MIT License

78 stars 26 forks source link

Mapper training during policy training #29

Closed fraazor closed 3 years ago

fraazor commented 3 years ago

Hi there,

I have been working on a variation to the projection unit to add a different type of "fog of war" approach for the sensors. However I do not fully understand the code implementation because the mapper training and policy training seem to happen simultaneously. Wouldn't that lead to sorted, correlated data/label pairs in the supervised part? Is there some shuffling happening that I am missing? Would really appreciate an answer to how this was approached.

srama2512 commented 3 years ago

We have a large replay buffer that stores data over multiple episodes. We randomly sample data from this buffer to break the correlation in data.

fraazor commented 3 years ago

Thanks a lot. I might have to train them seperately since I have not enough memory for such a large replay buffer.