Train model with PPO trainer

AnSrwn / Parkr

Parkr is a simulation in which a reinforcement learning agent learns how to park a car. In order to do this, Unity ML-Agents is used.

1 stars 0 forks source link

Train model with PPO trainer #9

Open AnSrwn opened 4 years ago

AnSrwn commented 4 years ago

More information: https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Training-ML-Agents.md#training-configurations

Document training.

Learnings so far:

max_steps in the config file defines how often the agent can be trained. Increase it, if the agent needs more training.

AnSrwn commented 4 years ago

May use the curiosity attribute, if there is not enough reward feedback (https://github.com/Unity-Technologies/ml-agents/blob/master/docs/ML-Agents-Overview.md#curiosity-for-sparse-reward-environments)

AnSrwn commented 4 years ago

Now that I think about it, maybe the penalty for going away is the problem here. I think it is generally better to avoid giving penalties to an agent that are caused by his actions. I.e: DO Give the agent a small penalty every timestep no matter what (to encourage finishing faster) DONT Give the agent a penalty when he does an action that you don't want it to do.

https://github.com/Unity-Technologies/ml-agents/issues/1457