Neural Map: Structured Memory for Deep Reinforcement Learning

https://arxiv.org/abs/1702.08360

A critical component to enabling intelligent reasoning in partially observable environments is memory. Despite this importance, Deep Reinforcement Learning (DRL) agents have so far used relatively simple memory architectures, with the main methods to overcome partial observability being either a temporal convolution over the past k frames or an LSTM layer. More recent work (Oh et al., 2016) has went beyond these architectures by using memory networks which can allow more sophisticated addressing schemes over the past k frames. But even these architectures are unsatisfactory due to the reason that they are limited to only remembering information from the last k frames. In this paper, we develop a memory system with an adaptable write operator that is customized to the sorts of 3D environments that DRL agents typically interact with. This architecture, called the Neural Map, uses a spatially structured 2D memory image to learn to store arbitrary information about the environment over long time lags. We demonstrate empirically that the Neural Map surpasses previous DRL memories on a set of challenging 2D and 3D maze environments and show that it is capable of generalizing to environments that were not seen during training.

Blah, I was excited for this one and it was a bit of a letdown. What they do is create an associative memory that is isomorphic to the actual playfield in the game. Then they let the agent store some arbitrary feature vector at a position the corresponds to the position of the agent. How do they get the position of the agent? Well they cheat and just take it out of the game engine and hand it to the agent architecture. They have a section at the end where they face this issue, but they solve it by changing it to be agent relative movements (so if the agent moves forward, you change the write location up by some delta).

It's neat in that it allows the agent to have some kind of spatial memory (think memory palace), but I'm skeptical it will do well on multitask or transfer since the maps that are created are very clearly tied to a specific game map. This might yield fruitful work eventually, but it's too raw to be useful in the near future. It's not a robust solution because you need an oracle to provide you with either the dimensions of the world you're in, or the exact coordinates you're at.

Relevance to transfer learning
- How likely is it that this paper improves knowledge transfer between different tasks?
- Not much relevance, they don't discuss it and there aren't reasons to think it will be good for transfer.
Relevance to multitask learning
- How likely is it that this paper reduces forgetting across multiple tasks?
- It may improve it somewhat, since it's adding a memory, but the fact that the maps are so tied to the specific game, I am bearish that it will get significant multi-task performance.
Adaptability to the problem domain
- How would the techniques described in the paper be used in an RL agent?
- They detail it in the paper since it's an RL technique.
Biological plausibility
- Gut feeling, what is the chance the brain is doing something along these lines?
- 0% getting global coordinates from an oracle
- 5% doing some kind of spacial processing akin to the ego-centric section
- If you don't think it's biologically plausible, does that matter?
Paper bonus points
- [X] Model-based RL
- [X] Attention
- [ ] Meta-learning
- [ ] Online / continuous learning
- [ ] Tests on Atari
- [X] Improves representations

Recommendation: strong irrelevant

Well they cheat ... very clearly tied to a specific game map

I've been keenly observing anything which can be paired with RatSLAM for real-world applications.

The cheating doesn't matter so much, can get those x/y/z from a RatSLAM Posecell network. With GridCell wrapping from one side to the other, size of environment matters less only the size of the network representing it.

or the exact coordinates you're at

Your max activation on any 3D vector is something, shouldn't really need to be exact if it helps solve the task, given that memory is read again when that cell activates most strongly as the target location.

edit: Just passing by and adding perspective, not overly interested in the benchmarking findings on these things only adhoc integration into hobby experiments. Was searching for code and ended up here.

AI-ON / Multitask-and-Transfer-Learning

Neural Map: Structured Memory for Deep Reinforcement Learning #3