Open quintus0505 opened 3 years ago
The dist_entropy
term is described in the original PPO paper (https://arxiv.org/pdf/1707.06347.pdf); see Equation 9. The trick has been used in earlier papers as well. The purpose is to encourage exploration.
We set entropy_coef
to 0 because it's already enough to solve the task. But if the policy gets stuck in a local minimum, increasing this term might help to find a better solution.
Thanks!
Hi, I am reading your codes and have problem in
evaluate_actions
when updating ppo:I notice that you get
dist_entropy
along with action and value loss, which function in backward propagation. Thoughdist_entropy
doesn't work in your code since theentropy_coef
currently in your code is 0 as default, I am still curious about how it functions and why you use this (What exactly "An ugly hack for my KFAC implementation." is :stuck_out_tongue_closed_eyes:)Thanks