Question about the implementation of PPO actor loss function

Denys88 / rl_games

RL implementations

MIT License

930 stars 155 forks source link

Question about the implementation of PPO actor loss function #229

Closed ZiyiLiubird closed 1 year ago

ZiyiLiubird commented 1 year ago

https://github.com/Denys88/rl_games/blob/fa1c13cc10158ab42e829a0971a9dc9a4544b3b9/rl_games/common/common_losses.py#L24

In the PPO paper , the importance sampling ratio is (action_log_probs / old_action_log_probs_batch) instead of (old_action_log_probs_batch / action_log_probs). So I wonder why the implementation here is (old_action_log_probs_batch / action_log_probs). Thanks!

Denys88 commented 1 year ago

mine is negative so -b - (-a) == a - b. it is the same :)

ZiyiLiubird commented 1 year ago

thanks a lot!