Open FrancisLeon opened 7 years ago
Reinforcement learning is a computational approach to understanding and automating goal-directed learning and decision-making. It is distinguished from other computational approaches by its emphasis on learning by an agent from direct interaction with its environment, without relying on exemplary supervision or complete models of the environment. In our opinion, reinforcement learning is the first field to seriously address the computational issues that arise when learning from interaction with an environment in order to achieve long-term goals.
Reinforcement learning uses a formal framework defining the interaction between a learning agent and its environment in terms of states, actions, and rewards. This framework is intended to be a simple way of representing essential features of the artificial intelligence problem. These features include a sense of cause and e↵ect, a sense of uncertainty and nondeterminism, and the existence of explicit goals.
The concepts of value and value functions are the key features of most of the reinforcement learning methods that we consider in this book. We take the position that value functions are important for effcient search in the space of policies. Their use of value functions distinguishes reinforcement learning methods from evolutionary methods that search directly in policy space guided by scalar evaluations of entire policies.
The most important feature distinguishing reinforcement learning from other types of learning is that it uses training information that evaluates the actions taken rather than instructs by giving correct actions.
Evaluative feedback | Instructive feedback |
---|---|
basis of methods for function optimization | indicates the correct action to take |
depends entirely on the action taken | independent of the action taken |
The purpose or goal of the agent is formalized in terms of a special reward signal passing from the environment to the agent.
reward hypothesis: That all of what we mean by goals and purposes can be well thought of as the maximization of the expected value of the cumulative sum of a received scalar signal.
We have said that the agent’s goal is to maximize the cumulative reward it receives in the long run.
By “the state” we mean whatever information is available to the agent. Moreover, what we would like, ideally, is a state signal that summarizes past sensations compactly, yet in such a way that all relevant information is retained. However, this requires never more than the complete history of all past sensations. A state signal that succeeds in retaining all relevant information is said to be Markov, or to have the Markov property.
A reinforcement learning task that satisfies the Markov property is called a Markov decision process, or MDP. If the state and action spaces are finite, then it is called a finite Markov decision process (finite MDP).
1.3 Elements of Reinforcement Learning
Policy
reward signal
A reward signal defines the goal in a reinforcement learning problem:
The agent’s sole objective is to maximize the total reward it receives over the long run.
On each time step, the environment sends to the reinforcement learning agent a single number, a reward.
The only way the agent can influence the reward signal is through its actions, which can have a direct e↵ect on reward, or an indirect effect through changing the environment’s state:
value function
Whereas the reward signal indicates what is good in an immediate sense, a value function specifies what is good in the long run.
Roughly speaking, the value of a state is the total amount of reward an agent can expect to accumulate over the future, starting from that state.
it is values with which we are most concerned when making and evaluating decisions.
Reward and Value
most important component of RL
a model of the environment