NorbertZheng / read-papers

My paper reading notes.
MIT License
7 stars 0 forks source link

arXiv '22 | The Quest for a Common Model of the Intelligent Decision Maker. #19

Closed NorbertZheng closed 2 years ago

NorbertZheng commented 2 years ago

Richard S. Sutton. The Quest for a Common Model of the Intelligent Decision Maker.

NorbertZheng commented 2 years ago

Abstract

The premise of Multi-Disciplinary Conference on Reinforcement Learning and Decision Making, RLDM is that multiple disciplines share an interest in goal-directed decision making over time. Here, Sutton deepens this idea by proposing a perspective on the decision maker that is substantive and widely used across domains, which is called common model of the intelligent agent. The common model does include:

Sutton is trying to devise a neutral terminology that can be used across disciplines and build on the convergence of multiple diverse disciplines on a substantive common model of the intelligent agent.

NorbertZheng commented 2 years ago

The Quest

The natural sciences of psychology, neuroscience, and ethology, the engineering sciences of artificial intelligence, optimal control theory, and operations research, and social sciences of economics and anthropology, all focus in part on intelligent decision makers. The perspectives of the various disciplines are different, but they have common elements. One cross-disciplinary goal is to identify the common core, those aspects of the decision-maker that are common to all or many of disciplines. There have been many scientific insights gained from cross-disciplinary interactions, such as the now-widespread use of Bayesian methodsa in psychology, the reward-prediction error interpretation of dopamine in neuroscience, and the longstanding use of the neural-network metaphor in machine learning. In this short paper, Sutton hopes to advance the quest in the following small ways:

NorbertZheng commented 2 years ago

Interface Terminology

The decision-maker makes its decisions over time, which may be divided into discrete steps at each of which:

What terminology shall we use for the signals and for the entities exchanging them?

The components of the system:

Screenshot_20220302_164144

NorbertZheng commented 2 years ago

Additive Rewards

Most disciplines formulate the agent's goal in terms of a scalar signal generated outside the agent's direct control, and thus we place its generation, formally, in the world. In the general case, this signal arrives on every step and the goal is to maximize its sum. Such additive rewards may be used to formulate the goal as:

A simpler but still popular notion of goal is as a state of the world to be reached. This allows a much more concrete inference, but less general than additive rewards. For example, it cannot:

Maybe we can use successive representation #11 to fulfill this goal, e.g. we use SR to process mental simulation to get the possible way to the goal state. And we can integrate reward as part of state, e.g. we learn the latent state space rather than receive the identified state space in the control theory domain.

NorbertZheng commented 2 years ago

Additive rewards have a long inter-disciplinary history:

It seems that I need to understand the stability of grid cells, which are rarely coupled with object-vector cells. It seems that the entorhinal cortex uses an independent encoding way to encode reward and state.

NorbertZheng commented 2 years ago

Standard Components of the Decision-Making Agent

Here, Sutton have opted to include in the agent only the most essential elements for which there is widespread (albeit not universal) agreement with and across disciplines, and to describe them only in general terms. The proposed common model of the internal structure of the agent has four principal components, which are interconnected by a central signal, e.g. the subjective state (seems from the Bayesian perspective):

Screenshot_20220302_195256

NorbertZheng commented 2 years ago

Limitations and Assessment

There is no explicit role in it for predictions of observations other than reward, we should note that transition model doesn't perform prediction actively, e.g. the agent is explicitly trained to predict the future reward rather than the next observation, but it still has no relationship with successive representation.