Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems By: UCB and Google Brain

Link: https://arxiv.org/pdf/2005.01643v1.pdf

Comment: Published on 2020. May. This is a tutorial paper on offline RL.

Problem: Offline RL: reinforcement learning algorithms that utilize previously collected data, without additional online data collection. However, the fact that reinforcement learning algorithms provide a fundamentally online learning paradigm is also one of the biggest obstacles to their widespread adoption

Summary:

RL algorithm category: 1.1 Policy Gradient 1.2 Approximate dynamic programming(Yu: Value Function Method e.g.: DQN) 1.3 Actor-Critic 1.4. Model-based reinforcement learning (Yu: why it's a separate category?)
Offline RL Q-learning algorithms, actor-critic algorithms that utilize Q-functions, and many model-based reinforcement learning algorithm are off-policy algorithms. However, off-policy algorithms still often employ additional interaction (i.e., online data collection) during the learning process. Therefore, the term “fully off-policy” is sometimes used to indicate that no additional online data collection is performed 2.4 What Makes Offline Reinforcement Learning Difficult? A more subtle but practically more important challenge is about making and answering ounterfactual queries. Counterfactual queries are, intuitively, “what if” questions. The fundamental challenge with making such counterfactual queries is distributional shift: while our function approximator (policy, value function, or model) might be trained under one distribution, it will be evaluated on a different distribution

3 Offline Evaluation and Reinforcement Learning via Importance Sampling Yu: PPO is one of this?

4 Offline Reinforcement Learning via Dynamic Programming 4.2 Distributional Shift in Offline Reinforcement Learning via Dynamic Programming Yu: SAC is one of this.

Offline Model-Based Reinforcement Learning 5.1 Model Exploitation and Distribution Shift 5.3 Challenges and Open Problems model-based reinforcement learning appears to be a natural fit for the offline RL problem setting ...

7 Discussion and Perspectives As a result, the standard off-policy training methods in these two categories have generally proven unsuitable for the kinds of complex domains typically studied in modern deep reinforcement learning key challenge in offline RL: distributional shift due to differences between the learned policy and the behavior policy. It is also still an open theoretical question as to whether model-based RL methods even in theory can improve over model-free dynamic programming algorithms.

QiXuanWang / LearningFromTheBest

Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems By: UCB and Google Brain #43