Closed fraazor closed 3 years ago
We have a large replay buffer that stores data over multiple episodes. We randomly sample data from this buffer to break the correlation in data.
Thanks a lot. I might have to train them seperately since I have not enough memory for such a large replay buffer.
Hi there,
I have been working on a variation to the projection unit to add a different type of "fog of war" approach for the sensors. However I do not fully understand the code implementation because the mapper training and policy training seem to happen simultaneously. Wouldn't that lead to sorted, correlated data/label pairs in the supervised part? Is there some shuffling happening that I am missing? Would really appreciate an answer to how this was approached.