QiXuanWang / LearningFromTheBest

This project is to list the best books, courses, tutorial, methods on learning certain knowledge
8 stars 1 forks source link

IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures #31

Open QiXuanWang opened 4 years ago

QiXuanWang commented 4 years ago

Link: semanticscholar

Code: https://github.com/deepmind/scalable_agent

This is a google deepmind paper published in ICML 2018, with very few references.

Comment: The main benefit compared with A3C/A2C is that when with multiple distributed machiens (200+ cpus), it gets much better scalability. The massive distribution, compared with A3C, comes from V-Trace and learner only responsible for learn and actor only responsible for trajectory generation. There is no parameter

Problem:

Innovation: In this work we aim to solve a large collection of tasks using a single reinforcement learning agent with a single set of parameters. We achieve stable learning at high throughput by combining decoupled acting and learning with a novel off-policy correction method called V-trace, which was critical for achieving learning stability.

IMPALA (Figure 1) uses an actor-critic setup to learn a policy π and a baseline function Vπ. The process of generating experiences is decoupled from learning the parameters of π and Vπ. The architecture consists of a set of actors, repeatedly generating trajectories of experience, and one or more learners that use the experiences sent from actors to learn π off-policy.

define n-steps V-trace target for V (xs): image

Architecture: image