Open Datoclement opened 6 years ago
This line came directly from the Unity implementation: https://github.com/Unity-Technologies/ml-agents/blob/master/python/unitytrainers/ppo/models.py
They have since updated it but the same line is still there. So I suggest you report the issue there.
I am looking into your code (which is pretty clean and clear by the way) and have a question for a line of code.
In the file PPO/ppo/model.py, line 185
r_theta = probs / (old_probs + 1e-10)
Would it be more accurate to modify it into
r_theta = tf.reduce_prod(probs,axis=-1) / (tf.reduce_prod(old_probs,axis=-1) + 1e-10)
?