AnSrwn / Parkr

Parkr is a simulation in which a reinforcement learning agent learns how to park a car. In order to do this, Unity ML-Agents is used.
1 stars 0 forks source link

Train model with SAC or GAIL #10

Open AnSrwn opened 4 years ago

AnSrwn commented 4 years ago

Der nächste Schritt wäre einen Agenten mit zwei Optimierungsalgorithmen zu trainieren. Hierfür könnten Sie im Reinforcement Learning-Bereich den PPO und DQN Algorithmus verwenden. Sie könnten aber auch das Training mit zwei versch. Optimizern umsetzen wie z.B. Adam o. RMSProp (gilt auch für Supervised Learning). Hier können sie aber auch gerne zwei grundsätzlich verschiedene Architekturen (wie MLP und CNN oder normales CNN und CNN mit Skip Connections) ihrer Agenten gegenüberstellen, falls diese eine sinnvolle Lösung ihres Problems erzielen könnten.

The easiest is probably SAC. But more research should be done, before deciding which one to use.

More information: https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Training-ML-Agents.md#training-configurations https://github.com/Unity-Technologies/ml-agents/blob/master/docs/ML-Agents-Overview.md

Document training.

AnSrwn commented 4 years ago

SAC could be interesting to use, because it is a complex and slow environment.

ML-Agents provide an implementation of two reinforcement learning algorithms:

Proximal Policy Optimization (PPO)
Soft Actor-Critic (SAC)

The default algorithm is PPO. This is a method that has been shown to be more general purpose and stable than many other RL algorithms.

In contrast with PPO, SAC is off-policy, which means it can learn from experiences collected at any time during the past. As experiences are collected, they are placed in an experience replay buffer and randomly drawn during training. This makes SAC significantly more sample-efficient, often requiring 5-10 times less samples to learn the same task as PPO. However, SAC tends to require more model updates. SAC is a good choice for heavier or slower environments (about 0.1 seconds per step or more). SAC is also a "maximum entropy" algorithm, and enables exploration in an intrinsic way. Read more about maximum entropy RL here. https://github.com/Unity-Technologies/ml-agents/blob/master/docs/ML-Agents-Overview.md#deep-reinforcement-learning