Acmece / rl-collision-avoidance

Implementation of the paper "Towards Optimally Decentralized Multi-Robot Collision Avoidance via Deep Reinforcement Learning"
https://arxiv.org/abs/1709.10082
326 stars 92 forks source link

Question about ppo algorithm #21

Closed sinaqahremani closed 3 years ago

sinaqahremani commented 3 years ago

Hello, I have 2 questions about the implementation of PPO.

what is dist_entropy used in evaluate_action method of CNNPolicy implemented in net.py and in ppo_update_stage1 function?

And I need theory background of this line in ppo.py: loss = policy_loss + 20 * value_loss - coeff_entropy * dist_entropy