Comment:
Published on 2019.06 by DeepMind. It's a multi-head attention mechanism used in RL. A lot of CNN/LSTM usage. This is mainly to capture image series or a motion. Probably not applicable to other usage?
The attention model is modified based on
Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron C. Courville, Ruslan Salakhutdinov,
Richard S. Zemel, and Yoshua Bengio. Show, attend and tell: Neural image caption generation
with visual attention. In ICML, 2015.
and
Drew A. Hudson and Christopher D. Manning. Compositional attention networks for machine
reasoning. volume abs/1803.03067, 2018.
Problem:
Previous efforts to visualize deep RL agents [1–3] focus on generating saliency maps to understand the magnitude of policy changes as a function of a perturbation of the input.
In this work we apply a soft, top-down, spatial attention mechanism to visual information in a reinforcement learning setting. The model enables us to build agents that actively select important, task-relevant information from visual inputs by sequentially querying and receiving compressed, query-dependent summaries to generate appropriate outputs
Link: https://arxiv.org/pdf/1906.02500.pdf Video: https://sites.google.com/view/s3ta
Comment: Published on 2019.06 by DeepMind. It's a multi-head attention mechanism used in RL. A lot of CNN/LSTM usage. This is mainly to capture image series or a motion. Probably not applicable to other usage? The attention model is modified based on Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron C. Courville, Ruslan Salakhutdinov, Richard S. Zemel, and Yoshua Bengio. Show, attend and tell: Neural image caption generation with visual attention. In ICML, 2015. and Drew A. Hudson and Christopher D. Manning. Compositional attention networks for machine reasoning. volume abs/1803.03067, 2018.
Problem: Previous efforts to visualize deep RL agents [1–3] focus on generating saliency maps to understand the magnitude of policy changes as a function of a perturbation of the input. In this work we apply a soft, top-down, spatial attention mechanism to visual information in a reinforcement learning setting. The model enables us to build agents that actively select important, task-relevant information from visual inputs by sequentially querying and receiving compressed, query-dependent summaries to generate appropriate outputs
Innovation: