DLR-RM / stable-baselines3

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.
https://stable-baselines3.readthedocs.io
MIT License
8.95k stars 1.68k forks source link

[Feature Request] Raise error when same object in memory passed to vectorized environment #1151

Closed Rocamonde closed 1 year ago

Rocamonde commented 1 year ago

🚀 Feature

TLDR: add a check to the result of env construction in make_vec_env that raises an error if all envs have the same memory ID. Happy to submit a PR for this.

Motivation

Currently when one creates a vectorized environment using make_vec_env and passes a function to create an environment multiple times, there are no checks making sure that the function indeed returns a fresh environment every time.

A priori it might seem that it's the user's responsibility to ensure this. However, if one is constructing an environment using hyperparameters and has a function that takes arguments to do this, e.g.

def make_my_env(*args, **kwargs): ...

it is relatively easy to forget to wrap this as a closure. One should do:

def make_env():
    return make_my_env(*args, **kwargs): ...

venv = make_vec_env(make_env, ...)

or

venv = make_vec_env(lambda: make_my_env(*args, **kwargs), ...)

but if one instead does

env = make_my_env(*args, **kwargs)
venv = make_vec_env(lambda: env, ...)

which is not something totally unreasonable to do at first sight, one will get totally undefined behavior where the same environment is being operated sequentially with actions that were calculated from multiple steps ago.

Same happens if you do

venv = DummyVecEnv([lambda: env] * n_envs, ...)

since it's fairly common to write

venv = DummyVecEnv([lambda: env], ...)

and to write

venv = DummyVecEnv([make_env] * n_envs, ...)

so mixing both up is not so unexpected. This has actually happened to me twice in the space of two months, and it took me several hours to realize of the errors. environments start behaving in totally bizarre ways, and it's hard to realize that it's not a policy training issue or a reward mis-specification issue unless you try to run with a single parallel environment and notice that it works fine.

Pitch

Raising an error if the same env instance is returned every time the env constructor is called is fairly straightforward and would avoid this type of error. I bet that no one would ever actually do this intentionally and it is almost surely always an error.

Just do in the DummyVecEnv initializer:

        if len(set(map(id, self.envs))) != len(self.envs):
            raise ValueError("The environments should be different instances")

or set([id(env) for env in self.envs])) using a list comprehension which is probably more pythonic.

Alternatives

No response

Additional context

No response

Checklist

araffin commented 1 year ago

Hello,

thanks for the proposal =), it is indeed a common mistake.

or set([id(env) for env in self.envs])) using a list comprehension which is probably more pythonic.

Yes, please go ahead =) (I would be more verbose for the error message though)

Rocamonde commented 1 year ago

Sounds good! Added a PR. Maybe it's worth adding a test for this, should be quite simple but I'm not familiar with your fixture system, do you think you could do that?

qgallouedec commented 1 year ago

Sounds good! Added a PR. Maybe it's worth adding a test for this, should be quite simple but I'm not familiar with your fixture system, do you think you could do that?

I'm handling the tests

Rocamonde commented 1 year ago

Cool sounds good!

baha2r commented 1 year ago

Hey I was doing the same, which apparently was not correct! Thanks for updating it. But the problem is that now I have no idea how to run the right code. Is there any kind of sample that could help me use make_vec_env

araffin commented 1 year ago

it. But the problem is that now I have no idea how to run the right code.

the error raised should give you some hints https://github.com/DLR-RM/stable-baselines3/blob/master/stable_baselines3/common/vec_env/dummy_vec_env.py#L28-L35

Is there any kind of sample that could help me use make_vec_env

there are examples in the doc and in this issue, but if you don't give any code example, it's hard to help. Another way to (ugly) fix that issue is to use sub processes (but best would be to solve the root cause).

timosturm commented 1 year ago

@Rocamonde Bless you for this issue and PR! This made me aware of some huge mistake I made when using this library.

Maybe other people have made the same mistake I did, so I want to leave this here:

Wrong:

environments = make_some_environments()   # creates a list of environments

DummyVecEnv(
    [
        lambda: env for env in environments
    ]
)

This does not produce a list of lambdas that each return one of the environments, but instead a list of lambdas that all return the last environment in the list. See https://stackoverflow.com/a/34021333 for more details. To get the expected behavior do the following:

Correct (but ugly):

environments = make_some_environments()   # creates a list of environments

DummyVecEnv(
    [
       (lambda env: lambda: env)(outer_env) for outer_env in environments
    ]
)

Or even better, do what is detailed in the issue.

sebnapi commented 1 year ago

Why is this code failing?

https://github.com/AI4Finance-Foundation/FinRL/blob/7414adb012b554f7c684d4615830dc5c31a094d1/finrl/agents/stablebaselines3/models.py#L287-L310

It's always returning a new object?

qgallouedec commented 1 year ago

Why is this code failing?

It shouldn't. I can't reproduce the error you describe. If it persists, I suggest you open a dedicated new issue.

sebnapi commented 1 year ago

Why is this code failing?

It shouldn't. I can't reproduce the error you describe. If it persists, I suggest you open a dedicated new issue.

Yeah, I'm just trying to get it working, but I get exactly this value error introduced here. I'm currently trying to downgrade to run it.

JoshuaClouse commented 7 months ago

@sebnapi did you ever create an issue to track this? I'm having the same issue and can't seem to find a way for the compiler to stop erroring:

def create_default_environment():
    #1. Create base environemnt
    newEnv = gym_super_mario_bros.make("SuperMarioBros-v0")
    #2. Simplify controls
    newEnv = JoypadSpace(newEnv, SIMPLE_MOVEMENT)
    #3. Original state.shape is (240, 256, 3) where 240x256 are the dimensions and 3 is the color channels RGB
    # keep_dim keeps one of the color channels to make (240, 256, 1) which is greyscale.
    return GrayScaleObservation(newEnv, keep_dim=True)

env = DummyVecEnv([lambda: create_default_environment()])

The examples weren't very helpful and I have yet to find a concrete example of how to use this constructor.