electronicarts / character-motion-vaes

Character Controllers using Motion VAEs
BSD 3-Clause "New" or "Revised" License
254 stars 39 forks source link

Question about "dist_entropy" when updating ppo #3

Open quintus0505 opened 3 years ago

quintus0505 commented 3 years ago

Hi, I am reading your codes and have problem in evaluate_actions when updating ppo:

I notice that you get dist_entropy along with action and value loss, which function in backward propagation. Though dist_entropy doesn't work in your code since the entropy_coef currently in your code is 0 as default, I am still curious about how it functions and why you use this (What exactly "An ugly hack for my KFAC implementation." is :stuck_out_tongue_closed_eyes:)

Thanks

belinghy commented 3 years ago

The dist_entropy term is described in the original PPO paper (https://arxiv.org/pdf/1707.06347.pdf); see Equation 9. The trick has been used in earlier papers as well. The purpose is to encourage exploration.

We set entropy_coef to 0 because it's already enough to solve the task. But if the policy gets stuck in a local minimum, increasing this term might help to find a better solution.

quintus0505 commented 2 years ago

Thanks!