Closed elliottower closed 1 year ago
Hello, if you want to know the conversion from gym api to VecEnv api, it is there: https://github.com/DLR-RM/stable-baselines3/blob/feat/gymnasium-support/stable_baselines3/common/vec_env/dummy_vec_env.py#L60-L71
Also relevant: https://github.com/DLR-RM/rl-baselines3-zoo/blob/feat/gymnasium-support/rl_zoo3/gym_patches.py#L27-L50
But gym API is for single env only normally. If you want to define a VecEnv directly, you can take a look at https://github.com/DLR-RM/rl-baselines3-zoo/pull/355 where we define a VecEnv for envpool envs.
Thanks for the links, the patched time limit works perfectly, but the problem is I can't get that to work with other wrappers like VecTransposeImage and VecFrameStack, which have the 4 return types with done
instead of 5 with terminated
and truncated
. I'll post some example code here so it's easier to understand the use case.
I found that ss actually has a ss.concat_vec_envs_v1() wrapper which works with base_class='stable_baselines3', and wraps it into a vector env, so I think (?) I can use that instead of writing my own vector env wrapper like https://github.com/DLR-RM/rl-baselines3-zoo/pull/355. Although I guess in this case maybe the best option is to write a wrapper which turns the pettingzoo env into vector envs that have the right return types to work with VecFrameStack and VecTransposeImage, as you said that API using done was going to continue to be the standard for SB3 internally.
I can't get that to work with other wrappers like VecTransposeImage and VecFrameStack,
it looks like you are mixing gym wrappers and VecEnv wrappers, as you notices they don't work together.
a wrapper which turns the pettingzoo env into vector envs that have the right return types to work with VecFrameStack and VecTransposeImage,
yes, probably the best option.
Just as an update I got this working by modifying ss.sb3_vector_wrapper
(used in ss.concat_vec_envs_v1
, opened a PR for it) but I was thinking it would probably make the most sense if there was support for creating gymnasium/pettingzoo vector envs directly with stable-baselines3 (@araffin ) As said earlier in this thread the differing number of return values for step() prevents the existing SB3 vec env functions from working.
Working code using ss:
env = utils.parallel_env(render_mode="rgb_array", env_config=env_config, max_cycles=rollout_len) # load from meltingpot into a PettingZoo env
env = ss.observation_lambda_v0(env, lambda x, _: x["RGB"], lambda s: s["RGB"])
env = ss.pettingzoo_env_to_vec_env_v1(env)
env = ss.concat_vec_envs_v1(
env,
num_vec_envs=num_envs,
num_cpus=num_cpus,
base_class="stable_baselines3")
env = vec_env.VecMonitor(env)
env = vec_env.VecTransposeImage(env, True)
env = vec_env.VecFrameStack(env, num_frames)
I got this working by modifying
Good to hear =)
I was thinking it would probably make the most sense if there was support for creating gymnasium/pettingzoo vector envs directly with stable-baselines3
As explained in its paper/blog posrt, SB3 is focused on single agent model free RL. Support for more should be done in external repositories (like imitation/offline RL). We also should not add additional dependencies (like petting zoo or super suit), so I would disagree with that statement.
The only custom VecEnv
we are considering adding now are envpool and isaac gym (both will probably be implemented in the zoo as they don't cover full VecEnv
features).
Working code using ss:
Closing as the original question is solved.
❓ Question
Posting this here to not spam the Gymnasium integration PR (#1327) as afaik it’s just a use case question rather than an issue with the PR. Will edit with example code to make things more clear but I mainly just want to know the best practices for converting envs with step() functions returning truncated and terminated bools into SB3’s API using done signals.
I would like to make vector envs but I run into issues due to the differing number of return types (5 vs 4). My initial thought was to ignore truncation and set done to equal termination, but reading discussions and documentation it seems like it’s best to set done equal to truncated or terminated. PR comments here say to use a TimeLimit wrapper as well, to capture the truncation signal. Is this then the best practice?
Example code of wrapping the env with this TimeLimit wrapper and doing this conversion would be greatly appreciated.
Relevant references: https://github.com/DLR-RM/stable-baselines3/blob/feat/gymnasium-support/docs/guide/vec_envs.rst https://github.com/openai/gym/issues/3102#issuecomment-1275909754 https://gymnasium.farama.org/content/migration-guide/ https://github.com/DLR-RM/stable-baselines3/pull/780#discussion_r1116365773
Edit: a bit more context for what my issue was (converting the step function): https://github.com/DLR-RM/stable-baselines3/pull/1327#issuecomment-1451232543
Full code below: sb3_train.py (updating older training script with older pettingzoo using gym rather than gymnasium):
Utils helper file (also updating original script with old pettingzoo/gym rather than gymnasium):
Error:
Checklist