An implementation of the Wolpertinger architecture, tailored to work with Mujoco. Tested on Inverted Pendulum, discretizing the continuous action space to 1e6 discrete actions and converging in 50-100k steps. In order for it to work with other environments, some massaging is needed, such as defining the kNN tree outside of the agent (ideally from the preset), and changing the proto action width according to the actions representation in the environment.
Deep Reinforcement Learning in Large Discrete Action Spaces
An implementation of the Wolpertinger architecture, tailored to work with Mujoco. Tested on Inverted Pendulum, discretizing the continuous action space to 1e6 discrete actions and converging in 50-100k steps. In order for it to work with other environments, some massaging is needed, such as defining the kNN tree outside of the agent (ideally from the preset), and changing the proto action width according to the actions representation in the environment.