Closed sinaqahremani closed 3 years ago
Hello, I have 2 questions about the implementation of PPO.
what is dist_entropy used in evaluate_action method of CNNPolicy implemented in net.py and in ppo_update_stage1 function?
dist_entropy
And I need theory background of this line in ppo.py: loss = policy_loss + 20 * value_loss - coeff_entropy * dist_entropy
loss = policy_loss + 20 * value_loss - coeff_entropy * dist_entropy
Hello, I have 2 questions about the implementation of PPO.
what is
dist_entropy
used in evaluate_action method of CNNPolicy implemented in net.py and in ppo_update_stage1 function?And I need theory background of this line in ppo.py:
loss = policy_loss + 20 * value_loss - coeff_entropy * dist_entropy