araffin / rl-tutorial-jnrr19

Stable-Baselines tutorial for Journées Nationales de la Recherche en Robotique 2019
MIT License
591 stars 113 forks source link

inconsistent deterministic setting in tutorial #18

Closed jiriyu98 closed 1 year ago

jiriyu98 commented 1 year ago

Hello,

Thanks for your excellent contributions, it is really helpful!

I am not sure if it is an issue, but it confused me at the beginning. Here is the thing. #1_getting_started.ipynb

Even though they are evaluated by the same agent, the difference between mean_reward_before_train and mean_reward is large. I checked the code and realized that mean_reward uses deterministic action while mean_reward_before_train does not.

If this is not intentionally set up to encourage people to investigate the implementation. Probably it would be better to make them consistent.

Best, J

araffin commented 1 year ago

Hello, you are right. The evaluate function is missing a deterministic argument. I would be happy to receive a PR that fixes it =)

jiriyu98 commented 1 year ago

Great, I just created a PR. Please have a look when you are free. ;-) #19