Model-based hierarchical reinforcement learning and human action control By:Matthew M Botvinick, Ari Weinstein

Link: SemanticSchoolar

This paper is published on 05 November 2014. It's main focus is to propose the idea of combining MB with HRL and hopefully it can mimic human behavior.

Problem: The inception of this effect was, of course, the discovery that the dynamics of dopamine release, as well as certain dopamine-dependent forms of learning, could be neatly modeled in terms of temporal-difference algorithms for RL. (Yu: The latest google paper #21 also shows that dopamine is working as distributional RL)

An important example is the contrast between model-free and model-based RL [3,4]. In computational terms, model-free RL assumes that learning occurs without access to any internal representation of the causal structure of the environment. Rather than building such an internal model, the agent instead simply stores estimates for the expected values of the actions available in each state or context, shaped by a history of direct interaction with the environment. In model-based RL, in contrast, the agent does possess an internal model, one that both predicts action outcomes and estimates the immediate reward associated with specific situations. Decisions are made not on the basis of stored action values, but instead through planning: the prospective use of the internal model to simulate and compare candidate lines of behaviour.1

QiXuanWang / LearningFromTheBest

Model-based hierarchical reinforcement learning and human action control By:Matthew M Botvinick, Ari Weinstein #27