Question about "dist_entropy" when updating ppo

quintus0505 commented 3 years ago

Hi, I am reading your codes and have problem in evaluate_actions when updating ppo:

https://github.com/electronicarts/character-motion-vaes/blob/main/algorithms/ppo.py#L95

I notice that you get dist_entropy along with action and value loss, which function in backward propagation. Though dist_entropy doesn't work in your code since the entropy_coef currently in your code is 0 as default, I am still curious about how it functions and why you use this (What exactly "An ugly hack for my KFAC implementation." is :stuck_out_tongue_closed_eyes:)

Thanks

belinghy commented 3 years ago

The dist_entropy term is described in the original PPO paper (https://arxiv.org/pdf/1707.06347.pdf); see Equation 9. The trick has been used in earlier papers as well. The purpose is to encourage exploration.

We set entropy_coef to 0 because it's already enough to solve the task. But if the policy gets stuck in a local minimum, increasing this term might help to find a better solution.

quintus0505 commented 2 years ago

Thanks!

electronicarts / character-motion-vaes

Question about "dist_entropy" when updating ppo #3