Algorithmic Framework for Model-based Deep Reinforcement Learning with Theoretical Guarantees By: Yuping Luo, Huazhe Xu, et,al

Link: OpenReview Code: https://github.com/roosephu/slbo

This is published on 27 Feb 2019 so it is earlier than UCB MBRL benchmark paper #5 and also cited by " When to Trust Your Model: Model-Based Policy Optimization" #9 paper.

Problem: Model-based reinforcement learning (RL) is considered to be a promising approach to reduce the sample complexity that hinders model-free RL. However, the theoretical understanding of such methods has been rather limited

Innovation: This paper introduces a novel algorithmic framework for designing and analyzing model-based RL algorithms with theoretical guarantees. We design a meta-algorithm with a theoretical guarantee of monotone improvement to a local maximum of the expected reward. The meta-algorithm iteratively builds a lower bound of the expected reward based on the estimated dynamical model and sample trajectories, and then maximizes the lower bound jointly over the policy and the model.

Comment: Review1: The paper proposed a framework to design model-based RL algorithms. The framework is based on OFU and within this framework the authors develop an algorithm (a variant of SLBO) achieving SOTA performance on MuJoCo tasks.

Response2: Indeed, our framework can capture all parameterized models (including linear model or even tabular MDP); however, our focus is on non-linear models. The distinction to the previous papers is that we are the first framework that can show the monotone improvement and handle the uncertainty quantification (via a discrepancy bound) for non-linear models.

The SLBO algorithm looks like:

QiXuanWang / LearningFromTheBest

Algorithmic Framework for Model-based Deep Reinforcement Learning with Theoretical Guarantees By: Yuping Luo, Huazhe Xu, et,al #23