[Question] Should train & eval environment seeds differ?

DLR-RM / rl-baselines3-zoo

A training framework for Stable Baselines3 reinforcement learning agents, with hyperparameter optimization and pre-trained agents included.

https://rl-baselines3-zoo.readthedocs.io

MIT License

1.89k stars 494 forks source link

[Question] Should train & eval environment seeds differ? #459

Closed leokraft closed 3 weeks ago

leokraft commented 4 weeks ago

❓ Question

First of all, thank you for the wonderful project!

I have a question about the random seed used for creating environments. It appears the same seed is used for both training and evaluation environments (see here).

Is it not a problem to use the same seed for both training and evaluation? Intuitively, I would expect different seeds to be used.

Checklist

[X] I have checked that there is no similar issue in the repo
[X] I have read the SB3 documentation
[X] I have read the RL Zoo documentation
[X] If code there is, it is minimal and working
[X] If code there is, it is formatted using the markdown code blocks for both code and stack traces.

qgallouedec commented 4 weeks ago

No, it's not a problem. The initial shared seed doesn't cause correlation between training and evaluation data. The state of the environment (including the rng state) depends not only on the seed but also on all the actions taken, and things like how many times you've sampled from the action space etc. As soon as any change occurs, even a small one, the environments become completely uncorrelated.

I hope I've been clear, I've gone over it several times to formulate this, if anyone has a clearer formulation I'm a taker :sweat_smile:.

leokraft commented 4 weeks ago

Thank you for the quick answer. I find it quite clear, so no worries. 😊

I understand that this applies to randomized agents; however, does this also hold for environments where the setting is randomized, such as in Frozen Lake, where the holes are randomized? With the same seed, wouldn't we evaluate the same map used for training instead of testing generalization on a new map?

qgallouedec commented 3 weeks ago

Maybe for the very first episode of the first evaluating agent, but not for the rest. That said, Frozen Lake is not a good example because the holes are not randomised.

leokraft commented 3 weeks ago

Got it! Thank you for all the insightful comments.