EmbersArc / PPO

PPO implementation for OpenAI gym environment based on Unity ML Agents
148 stars 21 forks source link

Probability of Action #4

Open Datoclement opened 6 years ago

Datoclement commented 6 years ago

I am looking into your code (which is pretty clean and clear by the way) and have a question for a line of code.

In the file PPO/ppo/model.py, line 185

r_theta = probs / (old_probs + 1e-10)

Would it be more accurate to modify it into r_theta = tf.reduce_prod(probs,axis=-1) / (tf.reduce_prod(old_probs,axis=-1) + 1e-10) ?

EmbersArc commented 6 years ago

This line came directly from the Unity implementation: https://github.com/Unity-Technologies/ml-agents/blob/master/python/unitytrainers/ppo/models.py

They have since updated it but the same line is still there. So I suggest you report the issue there.