Closed learningxiaobai closed 2 years ago
Hello,
if you look at the yaml files, you will find a n_envs
hyperparameter, that can also be changed on the fly:
python train.py --algo ppo --env CartPole-v1 -params n_envs:4
and the type of vec env is controlled via the --vec-env
argument
Hello, if you look at the yaml files, you will find a
n_envs
hyperparameter, that can also be changed on the fly:python train.py --algo ppo --env CartPole-v1 -params n_envs:4
and the type of vec env is controlled via the
--vec-env
argument
Thanks a lot.
Hello,When I use python train.py --algo tqc --env PandaReach-v1 -params n_envs:4,I met an error, ValueError: Error: the model does not support multiple envs; it requires a single vectorized environment. is this model not supporting multiple environments?or any other problems.Thanks a lot.
@learningxiaobai
As error states, the algorithm does not support multiple environments (off-policy methods like DQN, DDPG, TD3, SAC and TQC only work with one env). Close the issue if this answered the question.
does not support multiple environments (off-policy methods like DQN, DDPG, TD3, SAC and TQC only work with one env).
not yet https://github.com/DLR-RM/stable-baselines3/issues/179
(her reply buffer support will take additional time)
does not support multiple environments (off-policy methods like DQN, DDPG, TD3, SAC and TQC only work with one env).
not yet DLR-RM/stable-baselines3#179
(her reply buffer support will take additional time)
hello,when I use not her reply buffer,it didn't work as usual,such as: python train.py --algo ddpg --env HalfCheetahBulletEnv-v0 -params n_envs:4 I still met the same error, what is the problem?
I still met the same error, what is the problem?
are you using the experimental branch https://github.com/DLR-RM/stable-baselines3/pull/439 ? if not, that's normal, as @Miffyli off-policy algos on master do not support multi env training (but this will change once the PR is merged)
I added experimental support for multi env with HER in https://github.com/DLR-RM/stable-baselines3/pull/654
I added experimental support for multi env with HER in DLR-RM/stable-baselines3#654
great work!!!!!
I added experimental support for multi env with HER in DLR-RM/stable-baselines3#654
Hello, I use the branch feat/multienv-her,but When I use python train.py --algo tqc --env PandaReach-v1 -params n_envs:4,I still met the error. ValueError: Error: the model does not support multiple envs; it requires a single vectorized environment.
you need to use that branch of contrib: https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/pull/50
Hello, I met an error,Is there some wrong with her ?Thanks.
(rl-baselines3-zoo-master) C:\codes\rl\rl-baselines3-zoo-master>python train.py --algo tqc --env PandaPush-v1 -params n_envs:3 ========== PandaPush-v1 ========== Seed: 1088921540 Default hyperparameters for environment (ones being tuned will be overridden): OrderedDict([('batch_size', 2048), ('buffer_size', 1000000), ('env_wrapper', 'sb3_contrib.common.wrappers.TimeFeatureWrapper'), ('gamma', 0.95), ('learning_rate', 0.001), ('n_envs', 3), ('n_timesteps', 1000000.0), ('policy', 'MultiInputPolicy'), ('policy_kwargs', 'dict(net_arch=[512, 512, 512], n_critics=2)'), ('replay_buffer_class', 'HerReplayBuffer'), ('replay_buffer_kwargs', "dict( online_sampling=True, goal_selection_strategy='future', " 'n_sampled_goal=4, )'), ('tau', 0.05)]) Using 3 environments Creating test environment pybullet build time: Nov 2 2021 15:42:29 argv[0]= C:\ProgramData\Anaconda3\envs\rl-baselines3-zoo-master\lib\site-packages\gym\logger.py:34: UserWarning: WARN: Box bound precision lowered by casting to float32 warnings.warn(colorize("%s: %s" % ("WARN", msg % args), "yellow")) argv[0]= argv[0]= argv[0]= Using cpu device Log path: logs/tqc/PandaPush-v1_2 Traceback (most recent call last): File "train.py", line 195, in <module> exp_manager.learn(model) File "C:\codes\rl\rl-baselines3-zoo-master\utils\exp_manager.py", line 202, in learn model.learn(self.n_timesteps, **kwargs) File "C:\ProgramData\Anaconda3\envs\rl-baselines3-zoo-master\lib\site-packages\sb3_contrib\tqc\tqc.py", line 299, in learn reset_num_timesteps=reset_num_timesteps, File "C:\ProgramData\Anaconda3\envs\rl-baselines3-zoo-master\lib\site-packages\stable_baselines3\common\off_policy_algorithm.py", line 375, in learn self.train(batch_size=self.batch_size, gradient_steps=gradient_steps) File "C:\ProgramData\Anaconda3\envs\rl-baselines3-zoo-master\lib\site-packages\sb3_contrib\tqc\tqc.py", line 194, in train replay_data = self.replay_buffer.sample(batch_size, env=self._vec_normalize_env) File "C:\ProgramData\Anaconda3\envs\rl-baselines3-zoo-master\lib\site-packages\stable_baselines3\her\her_replay_buffer.py", line 652, in sample samples.append(self.buffers[i].sample(int(batch_sizes[i]), env)) File "C:\ProgramData\Anaconda3\envs\rl-baselines3-zoo-master\lib\site-packages\stable_baselines3\her\her_replay_buffer.py", line 212, in sample return self._sample_transitions(batch_size, maybe_vec_env=env, online_sampling=True) # pytype: disable=bad-return-type File "C:\ProgramData\Anaconda3\envs\rl-baselines3-zoo-master\lib\site-packages\stable_baselines3\her\her_replay_buffer.py", line 295, in _sample_transitions episode_indices = np.random.randint(0, self.n_episodes_stored, batch_size) File "mtrand.pyx", line 746, in numpy.random.mtrand.RandomState.randint File "_bounded_integers.pyx", line 1338, in numpy.random._bounded_integers._rand_int32 ValueError: high <= 0
I don't think it is related to multiprocessing. I suggest you to open a new issue.
The model tries to sample transitions before the first episode is stored. What is the value of learning_starts
? Did you change the environment in any way?
I don't think it is related to multiprocessing. I suggest you to open a new issue. The model tries to sample transitions before the first episode is stored. What is the value of
learning_starts
? Did you change the environment in any way?
nothing changed.just use default setting
add learning-start in PandaPush ,it works,thanks @qgallouedec
Is there any instruction to use multiprocessing in RB3 zoo, such as python train.py ----, I can't find where in train.py, so I can only call related libraries for training? similar: `env_id = "CartPole-v1" num_cpu = 4 # Number of processes to use
Create the vectorized environment
` Thanks