RL in Large Discrete Action Spaces - Wolpertinger Agent

Deep Reinforcement Learning in Large Discrete Action Spaces

An implementation of the Wolpertinger architecture, tailored to work with Mujoco. Tested on Inverted Pendulum, discretizing the continuous action space to 1e6 discrete actions and converging in 50-100k steps. In order for it to work with other environments, some massaging is needed, such as defining the kNN tree outside of the agent (ideally from the preset), and changing the proto action width according to the actions representation in the environment.

IntelLabs / coach

RL in Large Discrete Action Spaces - Wolpertinger Agent #394