Comment:
Publised on Aug.23, 2019. This is an improvement over HER and GAIL(Generative Adversarial Imitation Learning).
Problem:
Designing rewards for Reinforcement Learning (RL) is challenging because it
needs to convey the desired task, be efficient to optimize, and be easy to compute.
The latter is particularly problematic when applying RL to robotics, where detecting
whether the desired configuration is reached might require considerable supervision
and instrumentation
Innovation:
In this work we investigate different approaches to incorporate demonstrations to drastically speed up the convergence to a policy able to reach any goal, also surpassing the performance of an agent trained with other Imitation Learning algorithms.
Most of the previous work on IL is centered around trajectory following, or doing a single task. Furthermore it is limited by the performance of the demonstrations, or relies on engineered rewards to improve upon them. In this work we first illustrate how IL methods can be extended to the goal-conditioned setting, and study a more powerful relabeling strategy that extracts additional information from the demonstrations. We then propose a novel algorithm, goalGAIL, and show it can outperform the demonstrator without the need of any additional reward.
Key Techs:
4.1 Goal-conditioned Behavioral Cloning
4.2 Relabeling the expert
4.3 Goal-conditioned GAIL with Hindsight
5.2 Goal-conditioned GAIL with Hindsight: goalGAIL
Link: https://arxiv.org/pdf/1906.05838.pdf Code: https://sites.google.com/view/goalconditioned-il/ replication: https://openreview.net/forum?id=HJlCUp5M6H¬eId=VPGTimygxA
Comment: Publised on Aug.23, 2019. This is an improvement over HER and GAIL(Generative Adversarial Imitation Learning).
Problem:
Innovation: In this work we investigate different approaches to incorporate demonstrations to drastically speed up the convergence to a policy able to reach any goal, also surpassing the performance of an agent trained with other Imitation Learning algorithms.
Most of the previous work on IL is centered around trajectory following, or doing a single task. Furthermore it is limited by the performance of the demonstrations, or relies on engineered rewards to improve upon them. In this work we first illustrate how IL methods can be extended to the goal-conditioned setting, and study a more powerful relabeling strategy that extracts additional information from the demonstrations. We then propose a novel algorithm, goalGAIL, and show it can outperform the demonstrator without the need of any additional reward.
Key Techs: 4.1 Goal-conditioned Behavioral Cloning 4.2 Relabeling the expert 4.3 Goal-conditioned GAIL with Hindsight 5.2 Goal-conditioned GAIL with Hindsight: goalGAIL