The Unity Machine Learning Agents Toolkit (ML-Agents) is an open-source project that enables games and simulations to serve as environments for training intelligent agents using deep reinforcement learning and imitation learning.
There are several state of the art algorithms that use search to improve the policy trained with RL(e.g. AlphaZero, Student Of Games). The current implementation of ML-Agents does not seem to support that. On other hand the architecture should be able to handle such algorithms pretty well. For example the training loop for AlphaZero consist of agents that generate trajectories used by a learner to update the policy.
Have you considered adding support for reinforcement learning with search algorithms? Or it is out of scope for the project?
There are several state of the art algorithms that use search to improve the policy trained with RL(e.g. AlphaZero, Student Of Games). The current implementation of ML-Agents does not seem to support that. On other hand the architecture should be able to handle such algorithms pretty well. For example the training loop for AlphaZero consist of agents that generate trajectories used by a learner to update the policy.
Have you considered adding support for reinforcement learning with search algorithms? Or it is out of scope for the project?