DLR-RM / rl-baselines3-zoo

A training framework for Stable Baselines3 reinforcement learning agents, with hyperparameter optimization and pre-trained agents included.
https://rl-baselines3-zoo.readthedocs.io
MIT License
1.97k stars 505 forks source link

[Question] Multi agent environment #273

Closed camillo2008 closed 2 years ago

camillo2008 commented 2 years ago

Hi,

I was trying to use zoo for tuning my PPO hyperparams, I can't get my vec env to work with zoo. Because I use an environment with multiple agents inside, so there are multiple rewards and action returned by the step call every time. This is causing the monitor line 94 class to rise an exception due to the sum method that cannot be applied on the list (TypeError: unsupported operand type(s) for +: 'int' and 'list'), cause even with one agent inside the env I still return a list.

Should I change my code (if I return only the sum not wrapped in one list, it works) or there is something that I am missing?

I was searching for a solution without success, I have tried to use vec_monitor, but I can't get it to wrap my env alone, instead the vec_monitor wrap the monitor(UserWarning: The environment is already wrapped with a Monitor wrapperbut you are wrapping it with a VecMonitor wrapper, the Monitor statistics will beoverwritten by the VecMonitor ones.) and crash in the same way.

I am new whit sb3 and even newer whit zoo, maybe the concept of many agents inside one env is wrong?

araffin commented 2 years ago

Because I use an environment with multiple agents inside, so there are multiple rewards and action returned by the step call every time.

Then you must use a VecEnv or have one monitor file per sub-env.

The environment is already wrapped with a Monitor wrapperbut you are wrapping it with a VecMonitor wrapper, the Monitor statistics will beoverwritten by the VecMonitor ones.) and crash in the same way.

Then one option is to not wrap it with Monitor and wrap it only with VecMonitor.

camillo2008 commented 2 years ago

Then you must use a VecEnv or have one monitor file per sub-env.

I am alredy doing it, simply i didn't know how to train a VecEnv in zoo, and if it is possible.

from stable_baselines3.common.vec_env.base_vec_env import VecEnv, VecEnvIndices class VisualFlightEnvVec(VecEnv):

Then one option is to not wrap it with Monitor and wrap it only with VecMonitor.

How i can do this? The Monitor wrap my env by default how i can remove the Monitor env and add only the VecMonitor?

My ppo.yml is something like this: compass_env-v0:

normalize: "{'norm_obs': False, 'norm_reward': False}"

env_wrapper: stable_baselines3.common.vec_env.VecMonitor

vec_env_wrapper: stable_baselines3.common.vec_env.VecMonitor

policy: "MultiInputPolicy" n_envs: 1 n_steps: 500 n_timesteps: 50000

araffin commented 2 years ago

didn't know how to train a VecEnv in zoo, and if it is possible.

That's possible but you need a fork of the zoo. We create the env here: https://github.com/DLR-RM/rl-baselines3-zoo/blob/c2f00ea81c42daa0af0e5d131eb51b16552d5d8b/utils/exp_manager.py#L543

so you need to replace this line with a custom version of make_vec_env (you can take a look at the source code in SB3).

The Monitor wrap my env by default how i can remove the Monitor env and add only the VecMonitor?

same as before, you can have a fork of the RL Zoo with a custom make_vec_env.

vec_env_wrapper: stable_baselines3.common.vec_env.VecMonitor yes this would work if you overwrite make_vec_env to not wrap you env with a Monitor.

I will probably later enable the use of already vectorized env (when using envpool or isaac gym for instance).

camillo2008 commented 2 years ago

Really thank you, I don't know if I will try to customize zoo to fulfill VecEnv, in any case my question can be considered closed.