Parkr is a simulation in which a reinforcement learning agent learns how to park a car. In order to do this, Unity ML-Agents is used.
A model trained with PPO is included. So far it has not learned much, it only drives in circles.
Strategy | Outcome | Learning |
---|---|---|
Positive rewards for reaching target and negative rewards for collisions. | Agent generates too much negative rewards through collisions, so that it learns to leave the map. | First train without negative rewards for collision. |
Constantly giving a negative reward (time penalty), so that the agent reaches the target faster and more directly. | Agent learns to leave the map as fast as possible to reduce negative rewards through time penalty. | First train without time penalty. |
Giving rewards for intermediate goals, so that the agent learns to move in the direction of the target. It was tried with every 5 meters, 1 meter or for each action. The closer the agent got to the target, the more reward it received. | Agent learns to drive in circles, because it generates rewards. | Could be a solution if the intermediate goals can only be reached once. |
Only giving rewards for reaching the target and using curiosity module (different curiosity strengths were tested). | No training improvement was observed. | Maybe it shows an effect if training continues a few days longer. |
Reducing complexity: Start simple and build up. First stage: Agent needs to reach the target without any obstacles. Second stage: Include obstacles. Third stage: Negative reward for collisions with obstacles. Fourth stage: Time penalty | Agent could not even complete the first stage. | Seems logically and probably works, but a lot of time for training is needed. |
Increase number of hidden layers. Changed num_layer from 2 to 3. | Learning processes seemed to improve. | If the problem is complex, more hidden layers are needed. |
Simplify observations: The relative position to the target is maybe not clear enough. So, the new observations are the absolute locations of the agent and the target. | No training improvement was observed. | Keep the observations simple is probably always good. But if absolute locations are better than relatives, could not be confirmed. |
models
and summaries
in the root folder.mlagents-learn
mlagents-learn ./ParkrCar.yaml --run-id ParkrCar --train
tensorboard --logdir=summaries --port=6006
There are 14 observations.
There are 2 actions.