DLR-RM / stable-baselines3

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.
https://stable-baselines3.readthedocs.io
MIT License
9.07k stars 1.7k forks source link

[Question] Understanding Seed #642

Closed danielstankw closed 3 years ago

danielstankw commented 3 years ago

Important Note: We do not do technical support, nor consulting and don't answer personal questions per email. Please post your question on the RL Discord, Reddit or Stack Overflow in that case.

Question

I am running learning using PPO on a custom env made in Robosuite. I do not understand how can I set the seed to specific number in order to be able to reproduce results ? I have seen different ways of doing so and I am not sure if what I did is correct. Would appreciate some feedback.

Additional context

env = Monitor(env, monitor_dir, allow_early_resets=True)

env.seed(42)
set_random_seed(seed=42)

model = PPO('MlpPolicy', env, verbose=2, seed=42, tensorboard_log="./learning_log/ppo_tensorboard/")
model.learn(total_timesteps=10_000, tb_log_name='learning', reset_num_timesteps=True)`

Checklist

araffin commented 3 years ago

Hello, you are doing it the right way, normally, you only need to pass a seed to the constructor (it will also seed the env). If you don't get reproducible results, then the issue most probably come from the env... You can quickly check that by using CartPole-v1 or Pendulum-v0 instead of your custom env.

Related issues: https://github.com/DLR-RM/stable-baselines3/issues/557 and https://github.com/DLR-RM/stable-baselines3/issues/369

danielstankw commented 3 years ago

Thank you :)