PPO's performance - Githubissues

bmazoure / ppo_jax

Jax implementation of Proximal Policy Optimization (PPO) specifically tuned for Procgen, with benchmarked results and saved model weights on all environments.

48 stars 1 forks source link

Hi @bmazoure,

Your PPO +JAX implementation caught my eyes and this is a really cool repo!

Based on your benchmark with W&B, I compared the performance of your implementation with mine and openai/baselines in this report. Here are some performance differences:

I feel the reason for the difference might be:

Missing value bootstrapping in GAE: ppo_jax does not seem to bootstrap value if the environment is not terminated (buffer.py#L18-L26), whereas the original implementation does this (ppo2/runner.py#L56-L65) (ppo2/runner.py#L50)
Slightly off layer initialization. ppo_jax uses the same initialization scale for both value and policy (models.py#L91). However, the scale of the value function should be initialized with scale 1 instead of 0.01 (common/policies.py#L49-L63)

What do you think?

bmazoure / ppo_jax

PPO's performance #2