This is not a paper but a summary of recent HRL research progress and honestly doesn't give too much details so still need to review each paper.
Notice HRL is mostly to solve sparse reward problem since it uses arbitrary ending as a new goal.
Problem:
NA
Innovation/Contribution:
How to select proper HRL algorithms:
If the behaviours are completely specified → Options
If the behaviours are partially specified → HAM
If less domain knowledge is available → MAXQ, Learned Options
Link: TheGradient
This is not a paper but a summary of recent HRL research progress and honestly doesn't give too much details so still need to review each paper. Notice HRL is mostly to solve sparse reward problem since it uses arbitrary ending as a new goal.
Problem: NA
Innovation/Contribution: How to select proper HRL algorithms:
Comments: Papers mentioned in this blog: FeUdal Networks (FUN) for Hierarchical Reinforcement Learning HIRO (Data Efficient Hierarchical Reinforcement Learning) HAC (Learning Multi-Level Hierarchies with Hindsight) Locomotor Controllers On Reinforcement Learning for Full-length Game of Starcraft h-DQN Meta Learning Shared Hierarchies (MLSH) Modulated Policy Hierarchies (MPH) Stragetic Attentive Writer (STRAW) for Learning Macrow Actions H-DRLN Abstract Markov Decision Processes (AMDP) Iterative Hierarchical Optimization for Misspecified Problems (IMHOP) HSP Learning Representations in Model-Free HRL