Decision: Accept (Spotlight)
Comment: This paper presents a model-based RL approach to Atari games based on video prediction. The architecture performs remarkably well with a limited amount of interactions. This is a very significant result on a question that engages many in the research community.
Reviewers all agree that the paper is good and should be published. There is some disagreement about the novelty of it. However, as one reviewer states, the significance of the results is more important than the novelty. Many conference attendees would like to hear about it.
Based on this, I think the paper can be accepted for oral presentation.
Problem:
Innovation:
We use video prediction models, a model-based reinforcement learning algorithm and 2h of game play per game to train agents for 26 Atari games.
Conclusion/Future Work:
While SimPLe is able to learn more quickly than model-free methods, it does have limitations. First, the final scores are on the whole lower than the best state-of-the-art model-free methods. This can be improved with better dynamics models and, while generally common with model-based RL algorithms, suggests an important direction for future work. Another, less obvious limitation is that the performance of our method generally varied substantially between different runs on the
same game. The complex interactions between the model, policy, and data collection were likely
responsible for this. In future work, models that capture uncertainty via Bayesian parameter posteriors or ensembles (Kurutach et al., 2018; Chua et al., 2018) may improve robustness.
Comments:
The key point is that 2 hour of training data which means a lot less samples. And from comparison with Rainbow and PPO, it also shows much less env interactions.
The problem with this model is that it used picture as input and used a very complex world model. I don't know if it could applied to other field, though author claims Robotics and Autonomous Driving is their next target. Would it be applied to other non pixel inputs domain?
Link: OpenReview Code: http://bit.ly/2wjgn1a
Published on 26 Sep 2019 ICLR 2020
Comment from OpenReview:
Problem:
Innovation:
Conclusion/Future Work:
Comments: The key point is that 2 hour of training data which means a lot less samples. And from comparison with Rainbow and PPO, it also shows much less env interactions. The problem with this model is that it used picture as input and used a very complex world model. I don't know if it could applied to other field, though author claims Robotics and Autonomous Driving is their next target. Would it be applied to other non pixel inputs domain?