This paper is very similar to "Benchmarking Model-Based Reinforcement Learning" #5 .
This paper is published earlier in 2019/06. And main authors are from toronto university while #5 is from UCB.
Problem:
In this paper, we study the role of model usage in policy optimization both theoretically and empirically
Innovation:
we then demonstrate that a simple procedure of using short model-generated rollouts
branched from real data has the benefits of more complicated model-based algorithms without the usual pitfalls.
Our main contribution is a practical algorithm built on these insights, which we call model-based
policy optimization (MBPO), that makes limited use of a predictive model to achieve pronounced improvements in performance compared to other model-based approaches. More specifically, we
disentangle the task horizon and model horizon by querying the model only for short rollouts
Predictive model.
In our work, we use a bootstrap ensemble of dynamics models pi. Each member of the ensemble is a probabilistic neural network whose outputs parametrize a Gaussian
Policy optimization.
We adopt soft-actor critic (SAC) (Haarnoja et al., 2018) as our policy optimization algorithm.
Model usage
The branching strategy described in Section 4.2, in which model rollouts begin from the state distribution of a different policy under the true environment dynamics
Comment:
This looks like an improvement and tricky algorithm but sounds more practical for real usage.
The algorithm:
Link: Arxiv Code: https://github.com/JannerM/mbpo
This paper is very similar to "Benchmarking Model-Based Reinforcement Learning" #5 . This paper is published earlier in 2019/06. And main authors are from toronto university while #5 is from UCB.
Problem:
Innovation:
Comment: This looks like an improvement and tricky algorithm but sounds more practical for real usage. The algorithm: