Published in NIPS 2017. This is a widely cited paper.
Also see: Advances in Experience Replay
It can be used with any Off-Policy algorithm, like DDPG or DQN.
Problem:
Dealing with sparse rewards is one of the biggest challenges in Reinforcement Learning (RL). We present a novel technique called Hindsight Experience Replay which allows sample-efficient learning from rewards which are sparse and binary and therefore avoid the need for complicated reward engineering. It can be combined with an arbitrary off-policy RL algorithm and may be seen as a form of implicit curriculum.
Innovation:
It can be combined with any off-policy RL algorithm. It is applicable whenever there are multiple goals which can be achieved, e.g. achieving each state of the system may be treated as a separate goal. Not only does HER improve the sample efficiency in this setting, but more importantly, it makes learning possible even if the reward signal is sparse and binary.
The idea behind Hindsight Experience Replay (HER) is very simple: after experiencing some episode
s0, s1, . . . , sT we store in the replay buffer every transition st → st+1 not only with the original
goal used for this episode but also with a subset of other goals.
One choice which has to be made in order to use HER is the set of additional goals used for replay.
In the simplest version of our algorithm we replay each trajectory with the goal m(sT ), i.e. the goal
which is achieved in the final state of the episode.
Link: Semanticsholar
Published in NIPS 2017. This is a widely cited paper. Also see: Advances in Experience Replay It can be used with any Off-Policy algorithm, like DDPG or DQN.
Problem: Dealing with sparse rewards is one of the biggest challenges in Reinforcement Learning (RL). We present a novel technique called Hindsight Experience Replay which allows sample-efficient learning from rewards which are sparse and binary and therefore avoid the need for complicated reward engineering. It can be combined with an arbitrary off-policy RL algorithm and may be seen as a form of implicit curriculum.
Innovation: It can be combined with any off-policy RL algorithm. It is applicable whenever there are multiple goals which can be achieved, e.g. achieving each state of the system may be treated as a separate goal. Not only does HER improve the sample efficiency in this setting, but more importantly, it makes learning possible even if the reward signal is sparse and binary. The idea behind Hindsight Experience Replay (HER) is very simple: after experiencing some episode s0, s1, . . . , sT we store in the replay buffer every transition st → st+1 not only with the original goal used for this episode but also with a subset of other goals. One choice which has to be made in order to use HER is the set of additional goals used for replay. In the simplest version of our algorithm we replay each trajectory with the goal m(sT ), i.e. the goal which is achieved in the final state of the episode.