Unity-Technologies / ml-agents

The Unity Machine Learning Agents Toolkit (ML-Agents) is an open-source project that enables games and simulations to serve as environments for training intelligent agents using deep reinforcement learning and imitation learning.
https://unity.com/products/machine-learning-agents
Other
16.88k stars 4.11k forks source link

Beta parameter briefing. #1031

Closed Ilpolainen closed 6 years ago

Ilpolainen commented 6 years ago

Hi!

I want to make entropy value go up again while training with --load parameter. So what kind of parameter is beta exactly? Is it a constant with which the random percentage is decreased every now and then or something else. Maybe like this:

newRandomPercent = oldRandomPercent * beta ?

OR

newRandomPercent = oldRandomPercent - beta ? (clamped)

If it's something else and very complicated, I don't expect a detailed answer. Also the PPO doesn't mention what is the starting percent. Maybe 100% ?

Fantastic work you've done by the way!

vincentpierre commented 6 years ago

Hi, Beta corresponds to a regularization parameter, it is not a percentage. When you compute the loss in PPO (What the RL algorithm is trying to minimize), you add a term (the entropy regularization) that encourages the agent to take more random actions. I think what you have in mind is epsilon-greedy which does not work for on policy algorithms. If you want to load a model and make the entropy go up again, that would correspond to have the agent forget about his past training. I am not sure this is what you want. You could change the value of Beta during the training. We make Beta decrease linearly with time during training, if you increase max_step Beta will increase a little (because the number of steps required to decay to 0 changed). If this is not enough, you could load a training session with an increased Beta for a few steps, then when Entropy has risen to the levels you want, set it back to its original value. I hope this helps.

Ilpolainen commented 6 years ago

Hi!

Thank's a lot!

The entropy indeed did rise again, and now I know a little bit more about the functioning of your algorithms. I'll have to refresh my memory about Entropy and read a little bit. The curriculum-learning is fantastic by the way. I'm looking forward to try the imitation learning also.

gaoyuankidult commented 6 years ago

@vincentpierre Why does increasing Beta encourage the agent to take more random actions? From the PPO paper, since the penalty term *-beta KL is used, increasing beta will force KL** to be small, which should encourage less random actions, right?

vincentpierre commented 6 years ago

KL is not the Entropy in the PPO paper, it is a measure of the distance between the old policy and the new one. I think they name their Entropy regularization coefficient differently in this paper.

lock[bot] commented 4 years ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.