falcondai / falcondai.github.io

my personal website and blog
https://falcond.ai
MIT License
1 stars 0 forks source link

Attention in A2C #28

Open falcondai opened 4 years ago

falcondai commented 4 years ago

Write a blog about the visualization of A2C playing Atari Pong. It seems that many actions are about the same most of the time (the horizon is limited by gamma) and rarely specific actions are intended. This can motivate the study of semi-Markov decision processes. We can decide to take no actions in anticipation of the transit of the ball.

falcondai commented 4 years ago

It is not that we have a no-op button but rather that the decision making routine can fire off an option to consume the transit time--essentially teleporting itself to the next interesting moment.

falcondai commented 4 years ago

But where is the attention (or even consciousness)? Isn't executing the option as attentive as executing a policy? This may only make sense in an online setting. An online learning algorithm is being executed and the option is a fixed routine (reactive policy). The difference is that the learning algorithm can change its decision in the same state on a different visit whereas a policy cannot.

falcondai commented 4 years ago

Relates to Bengio's talk at NeurIPS'19