When to Trust Your Model: Model-Based Policy Optimization By: Michael Janner Justin Fu Marvin Zhang Sergey Levine

Link: Arxiv Code: https://github.com/JannerM/mbpo

This paper is very similar to "Benchmarking Model-Based Reinforcement Learning" #5 . This paper is published earlier in 2019/06. And main authors are from toronto university while #5 is from UCB.

Problem:

In this paper, we study the role of model usage in policy optimization both theoretically and empirically

Innovation:

we then demonstrate that a simple procedure of using short model-generated rollouts branched from real data has the benefits of more complicated model-based algorithms without the usual pitfalls. Our main contribution is a practical algorithm built on these insights, which we call model-based policy optimization (MBPO), that makes limited use of a predictive model to achieve pronounced improvements in performance compared to other model-based approaches. More specifically, we disentangle the task horizon and model horizon by querying the model only for short rollouts

Predictive model. In our work, we use a bootstrap ensemble of dynamics models pi. Each member of the ensemble is a probabilistic neural network whose outputs parametrize a Gaussian Policy optimization. We adopt soft-actor critic (SAC) (Haarnoja et al., 2018) as our policy optimization algorithm. Model usage The branching strategy described in Section 4.2, in which model rollouts begin from the state distribution of a different policy under the true environment dynamics

Comment: This looks like an improvement and tricky algorithm but sounds more practical for real usage. The algorithm:

QiXuanWang / LearningFromTheBest

When to Trust Your Model: Model-Based Policy Optimization By: Michael Janner Justin Fu Marvin Zhang Sergey Levine #9