Closed swagatk closed 3 years ago
@araffin I have a faint recollection somebody was working on Kuka environments related to SB3 (or zoo), but I can not seem to find it. Can you comment on this?
Hi,
I did some more investigation. I have been able to fix these errors to some extent. I am sharing it again here for the benefit of readers. I am able to avoid the previous error by passing the kuka
environment as a dict
variable. This allows me to make use of the make_vec_env
function without any errors. Looking into the source code of this function helped.
The code now appears something like this:
env_id = NormalizeObsvnWrapper
env_kwargs = dict(
env = KukaDiverseObjectEnv(maxSteps=20, isDiscrete=False, renders=False,
removeHeightHack=False)
)
vec_env = make_vec_env(env_id, n_envs=4, monitor_dir=monitor_path, env_kwargs=env_kwargs)
policy_kwargs = dict(
features_extractor_class = CustomCNN,
features_extractor_kwargs = dict(features_dim=64),
net_arch = dict(qf=[128, 64, 32], pi=[128, 64, 64])
)
# create a model
model = SAC('CnnPolicy', vec_env, buffer_size=70000, batch_size=256,
policy_kwargs=policy_kwargs, tensorboard_log=tb_log_path)
# train the model:
model.learn(total_timesteps=50000, log_interval=4, tb_log_name='kuka_sac_mp'),
I get the following error this time:
ValueError Traceback (most recent call last)
<ipython-input-8-755f0f448b2d> in <module>()
46
47 model = SAC('CnnPolicy', vec_env, buffer_size=70000, batch_size=256,
---> 48 policy_kwargs=policy_kwargs, tensorboard_log=tb_log_path)
49
50 # train the model: 50K time steps is adequate
2 frames
/usr/local/lib/python3.7/dist-packages/stable_baselines3/common/base_class.py in __init__(self, policy, env, policy_base, learning_rate, policy_kwargs, tensorboard_log, verbose, device, support_multi_env, create_eval_env, monitor_wrapper, seed, use_sde, sde_sample_freq, supported_action_spaces)
173 if not support_multi_env and self.n_envs > 1:
174 raise ValueError(
--> 175 "Error: the model does not support multiple envs; it requires " "a single vectorized environment."
176 )
177
ValueError: Error: the model does not support multiple envs; it requires a single vectorized environment.
So, it turns out that the current SAC
implementation does not support multiple environments. The code works if I use DummyVecEnv
as follows:
env = NormalizeObsvnWrapper(KukaDiverseObjectEnv(maxSteps=20, isDiscrete=False, renders=False,
removeHeightHack=False))
vec_env = DummyVecEnv([lambda: env])
policy_kwargs = dict(
features_extractor_class = CustomCNN,
features_extractor_kwargs = dict(features_dim=64),
net_arch = dict(qf=[128, 64, 32], pi=[128, 64, 64])
)
model = SAC('CnnPolicy', vec_env, buffer_size=70000, batch_size=256,
policy_kwargs=policy_kwargs, tensorboard_log=tb_log_path)
model.learn(total_timesteps=50000, log_interval=4, tb_log_name='kuka_sac_mp'),
Apparently, PPO
implementation supports multi-processing. The following code seems to work.
env_id = NormalizeObsvnWrapper
env_kwargs = dict(
env = KukaDiverseObjectEnv(maxSteps=20, isDiscrete=False, renders=False,
removeHeightHack=False)
)
vec_env = make_vec_env(env_id, n_envs=4, monitor_dir=monitor_path, env_kwargs=env_kwargs)
policy_kwargs = dict(
features_extractor_class = CustomCNN,
features_extractor_kwargs = dict(features_dim=64),
net_arch = [128, dict(vf=[128, 64, 32], pi=[128, 64, 64])] # check PPO documentation
)
# create a model
model = PPO('CnnPolicy', vec_env, n_steps=2048, batch_size=64,
policy_kwargs=policy_kwargs, tensorboard_log=tb_log_path)
# train the model
model.learn(total_timesteps=50000, log_interval=4, tb_log_name='kuka_ppo_mp', callback=eval_callback)
So, to conclude the discussion, SAC
does not support multi-processing while PPO
does. So, there is no problem with the library. I was making mistake in using the interface properly. Thank you for creating this great library. Also, in this post, I demonstrate how to make use of Gym Wrappers and custom policy networks to work with KukaDiverseObject
environment.
Regards, Swagat
SAC does not support multi-processing while PPO does.
yes, that's a planned feature (https://github.com/DLR-RM/stable-baselines3/issues/179) and there is an experimental branch for it: https://github.com/DLR-RM/stable-baselines3/pull/439
If the issue is solved, you can close it ;)
Yes, I think the issue is resolved. Thanks for your help. I am closing the thread.
@araffin
Hi,
It seems still not to work with dict obs using multiprocess with SAC. Could I know when it would be implemented?
all the best,
@JatGwingLam See above reply by arrafin: SAC multiprocessing is not supported yet, and is planned/worked on. If you really want to try it out, you could pull PR #439 and install SB3 from that and see if it works out.
@JatGwingLam See above reply by arrafin: SAC multiprocessing is not supported yet, and is planned/worked on. If you really want to try it out, you could pull PR #439 and install SB3 from that and see if it works out.
Thanks for your reply, I've tried that and found dict obs is not supported.
Thanks for your reply, I've tried that and found dict obs is not supported.
You could try merging master
branch to the PR branch, however there were drastic changes in dict-obs update so I am going to assume it will not automatically merge, and would require a ton of cleaning up.
@araffin any plans to continue on this some day?
You could try merging master branch to the PR branch, however there were drastic changes in dict-obs update so I am going to assume it will not automatically merge, and would require a ton of cleaning up.
The branch is up to date with master, it is just that the current multi-env implementation was not made (but can be adapted) for dict-obs. I plan to continue working on that in September.
So, to conclude the discussion,
SAC
does not support multi-processing whilePPO
does. So, there is no problem with the library. I was making mistake in using the interface properly. Thank you for creating this great library. Also, in this post, I demonstrate how to make use of Gym Wrappers and custom policy networks to work withKukaDiverseObject
environment.
This is so helpful! Thanks a bunch @swagatk !
Dear All,
I am new to SB3. I have been able to run few basic codes. Recently, I trained KukaDiverseObjectEnv successfully with SAC algorithm. Now I want to run multiple environments using SubProcVecEnv. It did not work for me. I want to share my experience here and hopefully, someone will be able to help me in fixing this bug. I am running all these codes on Google Colab.
I am using a Gym Wrapper to convert the observation images from channel-last format to channel-first format and normalize the pixels between 0 and 1. My wrapper is as follows:
I also use a
customCNN
network to extract features from the input images.Now I can train a single environment as follows:
This is how the training looks like on tensorboard:
Now I want to run 2 environments simultaneously using
SubprocVecEnv
. First I define themake_env
function as follows:My main training program is as follows:
I get the following error on Google Colab:
I will greatly appreciate any help in this regard.
Thanks, Swagat