QiXuanWang / LearningFromTheBest

This project is to list the best books, courses, tutorial, methods on learning certain knowledge
8 stars 1 forks source link

Data-Efficient Hierarchical Reinforcement Learning By: Ofir Nachum, Shixiang Gu, Honglak Lee, Sergey Levine #12

Open QiXuanWang opened 4 years ago

QiXuanWang commented 4 years ago

Link: arxiv


majority of current HRL methods require careful task-specific design and on-policy training, making them difficult to apply in real-world scenarios

This is because the changing behavior of the lower-level policy creates a non-stationary problem for the higher-level policy, and old off-policy experience may exhibit different transitions conditioned on the same goals (section 3.3)


we propose to use off-policy experience for both higher and lower-level training. This poses a considerable challenge, since changes to the lower-level behaviors change the action space for the higher-level policy, and we introduce an off-policy correction to remedy this challenge

Also see section 3.3


Our experiments show that HIRO can be used to learn highly complex behaviors for simulated robots, such as pushing objects and utilizing them to reach target locations, learning from only a few million samples, equivalent to a few days of real-time interaction. In comparisons with a number of prior HRL methods, we find that our approach substantially outperforms previous state-of-the-art techniques.

Comments: This is one of the important HRL paper that achieved SOTA results. Author proposed a new algorithm called "maximum likelihood-based action relabeling". It uses DDPG/TD3 as baseline algorithm and mostly work on continuous action space. But some old papers referenced are using discrete action space and I think extend to