FrancisLeon / Reinforement-Learning-

0 stars 0 forks source link

RL book #3

Open FrancisLeon opened 7 years ago

FrancisLeon commented 7 years ago

1.3 Elements of Reinforcement Learning

FrancisLeon commented 7 years ago

1.4 Limitations and Scope

FrancisLeon commented 7 years ago

1.6 Summary

Reinforcement learning is a computational approach to understanding and automating goal-directed learning and decision-making. It is distinguished from other computational approaches by its emphasis on learning by an agent from direct interaction with its environment, without relying on exemplary supervision or complete models of the environment. In our opinion, reinforcement learning is the first field to seriously address the computational issues that arise when learning from interaction with an environment in order to achieve long-term goals.

Reinforcement learning uses a formal framework defining the interaction between a learning agent and its environment in terms of states, actions, and rewards. This framework is intended to be a simple way of representing essential features of the artificial intelligence problem. These features include a sense of cause and e↵ect, a sense of uncertainty and nondeterminism, and the existence of explicit goals.

The concepts of value and value functions are the key features of most of the reinforcement learning methods that we consider in this book. We take the position that value functions are important for effcient search in the space of policies. Their use of value functions distinguishes reinforcement learning methods from evolutionary methods that search directly in policy space guided by scalar evaluations of entire policies.

FrancisLeon commented 7 years ago

2 Multi-arm Bandits

Preface

The most important feature distinguishing reinforcement learning from other types of learning is that it uses training information that evaluates the actions taken rather than instructs by giving correct actions.

Evaluative feedback Instructive feedback
basis of methods for function optimization indicates the correct action to take
depends entirely on the action taken independent of the action taken
FrancisLeon commented 7 years ago

2.1 An n-Armed Bandit Problem

FrancisLeon commented 7 years ago

3 Finite Markov Decision Processes

reward hypothesis: That all of what we mean by goals and purposes can be well thought of as the maximization of the expected value of the cumulative sum of a received scalar signal.

3.3 Returns

We have said that the agent’s goal is to maximize the cumulative reward it receives in the long run.

default

3.6 Markov Decision Processes

A reinforcement learning task that satisfies the Markov property is called a Markov decision process, or MDP. If the state and action spaces are finite, then it is called a finite Markov decision process (finite MDP).