[Question] Training reproducibility

DLR-RM / stable-baselines3

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.

https://stable-baselines3.readthedocs.io

MIT License

8.85k stars 1.68k forks source link

[Question] Training reproducibility #973

Closed a240160572 closed 2 years ago

a240160572 commented 2 years ago

Important Note: We do not do technical support, nor consulting and don't answer personal questions per email. Please post your question on the RL Discord, Reddit or Stack Overflow in that case.

Question

Hey, I am working on an algorithm comparison for my custom environment. The environment generates a random scenario for every rollout. Therefore, training the algorithms based on the same data is preferred.

I have tested with set_random_seed(seed = 1) or env.seed(seed = 1) for the environment. And model = A2C("MlpPolicy", env, seed = 1) for the model.

None of them yields two identical training results.

It is appreciated if you can show me how to set it properly.

Checklist

[x] I have read the documentation (required)
[x] I have checked that there is no similar issue in the repo (required)

araffin commented 2 years ago

Hello,

None of them yields two identical training results.

This probably means that your env does not implement the seed() method correctly, we have tests that actually check that for built-in gym env: https://github.com/DLR-RM/stable-baselines3/blob/master/tests/test_deterministic.py

a240160572 commented 2 years ago

Thank you for the quick reply. I checked my env with the test file. It is indeed not deterministic.

I am running a collision avoidance policy training. The starting point, goal, and obstacle positions are randomly generated for every rollout.

I tried to introduce the seed as a new signature to reset() as def reset(self, seed: Optional[int] = None):, like the example environment from gym. However, it replies reset() got an unexpected keyword argument 'seed'. Perhaps it is the wrong way to do that.

qgallouedec commented 2 years ago

Sb3 works with gym 0.21 (for now). The reset method in gym 0.21 doesn't take any argument.

In gym 0.22 and above, reset takes several args including the seed. You are probably referring to the doc of gym 0.24. Aren't you?

a240160572 commented 2 years ago

Sb3 works with gym 0.21 (for now). The reset method in gym 0.21 doesn't take any argument.

Ok, that makes sense then. I am referring to the code from https://github.com/openai/gym/blob/master/gym/envs/classic_control/cartpole.py, which is with gym 0.24.

araffin commented 2 years ago

Ok, that makes sense then. I am referring to the code from https://github.com/openai/gym/blob/master/gym/envs/classic_control/cartpole.py, which is with gym 0.24.

If you want to use that version, we have instructions in the documentation to install the associated PR (#780 ).

but for now, I would recommend implementing the seed() method first (we have backward compat so your code won't break later when we merge the PR).

Closing as the issue comes from custom gym env and not SB3.

a240160572 commented 2 years ago

Closing as the issue comes from custom gym env and not SB3.

Sorry for that.

but for now, I would recommend implementing the seed() method first (we have backward compat so your code won't break later when we merge the PR).

May I ask for a hint about how to realize it with seed() in gym 0.21? Appreciate it.

qgallouedec commented 2 years ago

I advise you to take inspiration from the implementation of the numerous environments integrated with gym 0.21: https://github.com/openai/gym/tree/c755d5c35a25ab118746e2ba885894ff66fb8c43

If you have difficulty doing this, I urge you to ask for help on this discord

ReHoss commented 2 years ago

I encountered the same issue, indeed env.seed() does nothing under the gym==0.21 and we should implement it ourselves as it was done in the past. Could be nice to write something about this in the documentation:) ? Thank you