Probability of Action - Githubissues

EmbersArc / PPO

PPO implementation for OpenAI gym environment based on Unity ML Agents

148 stars 21 forks source link

Open Datoclement opened 6 years ago

Datoclement commented 6 years ago

I am looking into your code (which is pretty clean and clear by the way) and have a question for a line of code.

In the file PPO/ppo/model.py, line 185

r_theta = probs / (old_probs + 1e-10)

Would it be more accurate to modify it into r_theta = tf.reduce_prod(probs,axis=-1) / (tf.reduce_prod(old_probs,axis=-1) + 1e-10) ?

EmbersArc commented 6 years ago

They have since updated it but the same line is still there. So I suggest you report the issue there.