Closed chrisgao99 closed 1 month ago
hello, i think there is a misconception between seed, used for pseudo random generator and scenarios.
hello, i think there is a misconception between seed, used for pseudo random generator and scenarios.
Thanks. Do you mean the seed data[0] in this file
https://github.com/DLR-RM/stable-baselines3/blob/285e01f64aa8ba4bd15aa339c45876d56ed0c3b4/stable_baselines3/common/vec_env/subproc_vec_env.py#L46
are pseudo random generator?
But if I don't change the seed in my wrapper like this kwargs['seed'] = np.random.randint(0,10), the seed data[0] will actually be my scenario seed. And this will raise a seed out of range error.
So can I control the data[0] seed?
why not sample the scenario idx?
scenario_idx = np.random.randint(0,10)
why not sample the scenario idx?
scenario_idx = np.random.randint(0,10)
May I ask where to put this line of code?
In the reset method of your env.
In the reset method of your env.
Yeah, this also works. But It's the same with changing my env wrapper. If I add a sample scheme in the reset function, I will always need to change the scheme when I want to use other env seeds.
Maybe right now modifying reset function is the only method. But it would be better for stable baseline to have the parameter that I can specify how to sample env seed for training with VecEnv.
Maybe right now modifying reset function is the only method. But it would be better for stable baseline to have the parameter that I can specify how to sample env seed for training with VecEnv.
Again, I think you are confusing seed of pseudo random generator and options. With VecEnv, you can directly call method that will set parameters in the env (see doc and other issues) and you can also set the reset options.
import numpy as np
# This is a seed of the numpy default pseudo random generator
seed = 76732031632
np.random.seed(seed)
for _ in range(5):
print(np.random.randint(0, 10))
# Seed again to obtain the same sequence
np.random.seed(seed)
for _ in range(5):
print(np.random.randint(0, 10))
I see what you mean. Thank you.
At the beginning of training, there's "seed" parameter in make_vec_env function, which is the initial seed for the random number generator. And if I don't assign a value to it, it will be a random big integer, which will cause a seed out of range error to my env. So I just need to set a smaller initial seed for the random number generator
Here is the code to reproduce it for other people to understand.
import gymnasium as gym
from stable_baselines3 import A2C
from stable_baselines3.common.env_util import make_vec_env
from stable_baselines3.common.vec_env import SubprocVecEnv
class SeedPrintWrapper(gym.Wrapper):
def __init__(self, env):
super().__init__(env)
def reset(self, **kwargs):
seed = kwargs.get('seed', None)
print(f"Environment seed: {seed}")
return self.env.reset(**kwargs)
def make_custom_env(env_id):
def _init():
env = gym.make(env_id)
env = SeedPrintWrapper(env)
return env
return _init
if __name__ == '__main__':
# Create vectorized environments
vec_env = make_vec_env(make_custom_env("CartPole-v1"), n_envs=2, vec_env_cls=SubprocVecEnv)
# Initialize the model with the vectorized environment
model = A2C("MlpPolicy", vec_env, verbose=1)
model.learn(total_timesteps=100)
outputs:
data[0] is: 4119632371
data[0] is: 4119632372
Environment seed: 4119632371
Environment seed: 4119632372
Environment seed: None
Environment seed: None
Environment seed: None
Environment seed: None
Environment seed: None
Environment seed: None
After I set the seed in make_vec_env,
vec_env = make_vec_env(make_custom_env("CartPole-v1"), n_envs=2, vec_env_cls=SubprocVecEnv,seed=0)
ouputs:
data[0] is: 0
Environment seed: 0
data[0] is: 1
Environment seed: 1
Environment seed: None
Environment seed: None
Environment seed: None
Environment seed: None
🐛 Bug
When using SubprocVecEnv from stable-baselines3,
the seeds are automatically set in a sequential manner starting from a base seed, e.g., 33247589, 33247590, etc. The relative code is here:
However, my environment requires the seed to be within the range 0-9 as I only have 10 scenarios saved in a directory, and each seed corresponds to a specific scenario file.
One workable method to solve this is to use a wrapper for the env, and change the seed in the env reset function:
But this is not very convenient for testing. If I want to test scenarios from seed 0 to 9 one by one, I can't directly use env.reset(seed). Instead, I have to modify the wrapper again.
So are there any ways to manually control the seed range for envs generated with SubprocVecEnv?
Code example
No response
Relevant log output / Error message
No response
System Info
No response
Checklist