Open AnSrwn opened 4 years ago
SAC could be interesting to use, because it is a complex and slow environment.
ML-Agents provide an implementation of two reinforcement learning algorithms:
Proximal Policy Optimization (PPO) Soft Actor-Critic (SAC)
The default algorithm is PPO. This is a method that has been shown to be more general purpose and stable than many other RL algorithms.
In contrast with PPO, SAC is off-policy, which means it can learn from experiences collected at any time during the past. As experiences are collected, they are placed in an experience replay buffer and randomly drawn during training. This makes SAC significantly more sample-efficient, often requiring 5-10 times less samples to learn the same task as PPO. However, SAC tends to require more model updates. SAC is a good choice for heavier or slower environments (about 0.1 seconds per step or more). SAC is also a "maximum entropy" algorithm, and enables exploration in an intrinsic way. Read more about maximum entropy RL here. https://github.com/Unity-Technologies/ml-agents/blob/master/docs/ML-Agents-Overview.md#deep-reinforcement-learning
The easiest is probably SAC. But more research should be done, before deciding which one to use.
More information: https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Training-ML-Agents.md#training-configurations https://github.com/Unity-Technologies/ml-agents/blob/master/docs/ML-Agents-Overview.md
Document training.