DLR-RM / stable-baselines3

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.
https://stable-baselines3.readthedocs.io
MIT License
8.35k stars 1.6k forks source link

SubprocVecEnv Sets Out-of-Range Seeds for My Environments (ScenarioNet Enviroment) #1921

Closed chrisgao99 closed 1 month ago

chrisgao99 commented 2 months ago

🐛 Bug

When using SubprocVecEnv from stable-baselines3,

env = make_vec_env(lambda: env_creator3(env_config), n_envs=n_envs, vec_env_cls=SubprocVecEnv)

the seeds are automatically set in a sequential manner starting from a base seed, e.g., 33247589, 33247590, etc. The relative code is here:

https://github.com/DLR-RM/stable-baselines3/blob/285e01f64aa8ba4bd15aa339c45876d56ed0c3b4/stable_baselines3/common/vec_env/subproc_vec_env.py#L46

However, my environment requires the seed to be within the range 0-9 as I only have 10 scenarios saved in a directory, and each seed corresponds to a specific scenario file.

One workable method to solve this is to use a wrapper for the env, and change the seed in the env reset function:

class MyWrapper(gym.Wrapper):
    def __init__(self, env):
        super().__init__(env)

    def reset(self, **kwargs):
        kwargs['seed'] = np.random.randint(0,10)
        obs,info = self.env.reset(**kwargs)
        return obs,info

But this is not very convenient for testing. If I want to test scenarios from seed 0 to 9 one by one, I can't directly use env.reset(seed). Instead, I have to modify the wrapper again.

So are there any ways to manually control the seed range for envs generated with SubprocVecEnv?

Code example

No response

Relevant log output / Error message

No response

System Info

No response

Checklist

araffin commented 2 months ago

hello, i think there is a misconception between seed, used for pseudo random generator and scenarios.

chrisgao99 commented 1 month ago

hello, i think there is a misconception between seed, used for pseudo random generator and scenarios.

Thanks. Do you mean the seed data[0] in this file

https://github.com/DLR-RM/stable-baselines3/blob/285e01f64aa8ba4bd15aa339c45876d56ed0c3b4/stable_baselines3/common/vec_env/subproc_vec_env.py#L46

are pseudo random generator?

But if I don't change the seed in my wrapper like this kwargs['seed'] = np.random.randint(0,10), the seed data[0] will actually be my scenario seed. And this will raise a seed out of range error.

So can I control the data[0] seed?

qgallouedec commented 1 month ago

why not sample the scenario idx?

scenario_idx = np.random.randint(0,10)
chrisgao99 commented 1 month ago

why not sample the scenario idx?

scenario_idx = np.random.randint(0,10)

May I ask where to put this line of code?

qgallouedec commented 1 month ago

In the reset method of your env.

chrisgao99 commented 1 month ago

In the reset method of your env.

Yeah, this also works. But It's the same with changing my env wrapper. If I add a sample scheme in the reset function, I will always need to change the scheme when I want to use other env seeds.

Maybe right now modifying reset function is the only method. But it would be better for stable baseline to have the parameter that I can specify how to sample env seed for training with VecEnv.

araffin commented 1 month ago

Maybe right now modifying reset function is the only method. But it would be better for stable baseline to have the parameter that I can specify how to sample env seed for training with VecEnv.

Again, I think you are confusing seed of pseudo random generator and options. With VecEnv, you can directly call method that will set parameters in the env (see doc and other issues) and you can also set the reset options.

import numpy as np

# This is a seed of the numpy default pseudo random generator
seed = 76732031632
np.random.seed(seed)
for _ in range(5):
    print(np.random.randint(0, 10))

# Seed again to obtain the same sequence
np.random.seed(seed)
for _ in range(5):
    print(np.random.randint(0, 10))
chrisgao99 commented 1 month ago

I see what you mean. Thank you.

At the beginning of training, there's "seed" parameter in make_vec_env function, which is the initial seed for the random number generator. And if I don't assign a value to it, it will be a random big integer, which will cause a seed out of range error to my env. So I just need to set a smaller initial seed for the random number generator

Here is the code to reproduce it for other people to understand.

import gymnasium as gym
from stable_baselines3 import A2C
from stable_baselines3.common.env_util import make_vec_env
from stable_baselines3.common.vec_env import SubprocVecEnv

class SeedPrintWrapper(gym.Wrapper):
    def __init__(self, env):
        super().__init__(env)

    def reset(self, **kwargs):
        seed = kwargs.get('seed', None)
        print(f"Environment seed: {seed}")

        return self.env.reset(**kwargs)

def make_custom_env(env_id):
    def _init():
        env = gym.make(env_id)
        env = SeedPrintWrapper(env)
        return env
    return _init

if __name__ == '__main__':

    # Create vectorized environments
    vec_env = make_vec_env(make_custom_env("CartPole-v1"), n_envs=2, vec_env_cls=SubprocVecEnv)

    # Initialize the model with the vectorized environment
    model = A2C("MlpPolicy", vec_env, verbose=1)
    model.learn(total_timesteps=100)

outputs:

data[0] is:  4119632371
data[0] is:  4119632372
Environment seed: 4119632371
Environment seed: 4119632372
Environment seed: None
Environment seed: None
Environment seed: None
Environment seed: None
Environment seed: None
Environment seed: None

After I set the seed in make_vec_env,

vec_env = make_vec_env(make_custom_env("CartPole-v1"), n_envs=2, vec_env_cls=SubprocVecEnv,seed=0)

ouputs:

data[0] is:  0
Environment seed: 0
data[0] is:  1
Environment seed: 1
Environment seed: None
Environment seed: None
Environment seed: None
Environment seed: None