In the PPO paper , the importance sampling ratio is (action_log_probs / old_action_log_probs_batch) instead of (old_action_log_probs_batch / action_log_probs). So I wonder why the implementation here is (old_action_log_probs_batch / action_log_probs). Thanks!
https://github.com/Denys88/rl_games/blob/fa1c13cc10158ab42e829a0971a9dc9a4544b3b9/rl_games/common/common_losses.py#L24
In the PPO paper , the importance sampling ratio is (action_log_probs / old_action_log_probs_batch) instead of (old_action_log_probs_batch / action_log_probs). So I wonder why the implementation here is (old_action_log_probs_batch / action_log_probs). Thanks!