Question about ppo algorithm

Acmece / rl-collision-avoidance

Implementation of the paper "Towards Optimally Decentralized Multi-Robot Collision Avoidance via Deep Reinforcement Learning"

https://arxiv.org/abs/1709.10082

326 stars 92 forks source link

Closed sinaqahremani closed 3 years ago

sinaqahremani commented 3 years ago

Hello, I have 2 questions about the implementation of PPO.

what is dist_entropy used in evaluate_action method of CNNPolicy implemented in net.py and in ppo_update_stage1 function?

And I need theory background of this line in ppo.py: loss = policy_loss + 20 * value_loss - coeff_entropy * dist_entropy