araffin / rl-tutorial-jnrr19

Stable-Baselines tutorial for Journées Nationales de la Recherche en Robotique 2019
MIT License
591 stars 113 forks source link

Fix inconsistency between two mean_rewards on deterministic actions #19

Closed jiriyu98 closed 1 year ago

jiriyu98 commented 1 year ago

To fix this issue (#18).

Set the "handwritten" evaluate function to be consistent with evaluate_policy default behavior. (i.e. set deterministic=True)

For users who are not familiar with the stable_baselines3, I think it would be great to add some additional comments to indicate the effect of deterministic.

Closes #18