QiXuanWang / LearningFromTheBest

This project is to list the best books, courses, tutorial, methods on learning certain knowledge
8 stars 1 forks source link

Model Based Reinforcement Learning for Atari By: Łukasz Kaiser, Mohammad Babaeizadeh, ..., Henryk Michalewski #15

Open QiXuanWang opened 4 years ago

QiXuanWang commented 4 years ago

Link: OpenReview Code: http://bit.ly/2wjgn1a

Published on 26 Sep 2019 ICLR 2020

Comment from OpenReview:

Decision: Accept (Spotlight) Comment: This paper presents a model-based RL approach to Atari games based on video prediction. The architecture performs remarkably well with a limited amount of interactions. This is a very significant result on a question that engages many in the research community.

Reviewers all agree that the paper is good and should be published. There is some disagreement about the novelty of it. However, as one reviewer states, the significance of the results is more important than the novelty. Many conference attendees would like to hear about it.

Based on this, I think the paper can be accepted for oral presentation.

Problem:

Innovation:

We use video prediction models, a model-based reinforcement learning algorithm and 2h of game play per game to train agents for 26 Atari games. image

Conclusion/Future Work:

While SimPLe is able to learn more quickly than model-free methods, it does have limitations. First, the final scores are on the whole lower than the best state-of-the-art model-free methods. This can be improved with better dynamics models and, while generally common with model-based RL algorithms, suggests an important direction for future work. Another, less obvious limitation is that the performance of our method generally varied substantially between different runs on the same game. The complex interactions between the model, policy, and data collection were likely responsible for this. In future work, models that capture uncertainty via Bayesian parameter posteriors or ensembles (Kurutach et al., 2018; Chua et al., 2018) may improve robustness.

Comments: The key point is that 2 hour of training data which means a lot less samples. And from comparison with Rainbow and PPO, it also shows much less env interactions. The problem with this model is that it used picture as input and used a very complex world model. I don't know if it could applied to other field, though author claims Robotics and Autonomous Driving is their next target. Would it be applied to other non pixel inputs domain?

QiXuanWang commented 4 years ago

Referenced #6