DLR-RM / stable-baselines3

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.
https://stable-baselines3.readthedocs.io
MIT License
8.65k stars 1.65k forks source link

[Question] Question about VecEnv using a custom environment in SB3 #1968

Closed wilhem closed 2 weeks ago

wilhem commented 1 month ago

❓ Question

I have a custom environment (inherited from Gymnasium and yes check_env runs without any errors or warnings) and now I'm trying to migrate it to a vectorized environment.

My question is: since the official documentation shows an example using a standard CartPole-v1 environment and not really a custom class or function, how do I adapt my environment class to be vectorized?

A minimal example is:

from stable_baselines3 import PPO
from stable_baselines3.common.env_checker import check_env
from stable_baselines3.common.env_util import make_vec_env
from stable_baselines3.common.vec_env import VecNormalize, VecEnv, SubprocVecEnv

from gymnasium.spaces import Box, Dict

import gymnasium as gym
import numpy as np
import argparse
import shutil
import time
import os

class CustomEnv(gym.Env):

    def __init__(self):
        super().__init__()

        self.observation_space = Dict({'observation': Box(low = np.array([-np.inf, -np.inf, -np.inf, -np.radians(MAX_ANGLE, dtype = np.float64), -np.radians(1, dtype = np.float64), 2]),
                                                          high = np.array([np.inf, np.inf, np.inf, np.radians(MAX_ANGLE, dtype = np.float64), np.radians(71, dtype = np.float64), 15]), 
                                                          shape = (6,), dtype = np.float64),
                                       'achieved_goal': Box(low = np.array([-np.inf, -np.inf, -np.inf]),
                                                            high = np.array([np.inf, np.inf, np.inf]),
                                                            shape = (3,), dtype = np.float64),
                                       'desired_goal': Box(low = np.array([-np.inf, -np.inf, -np.inf]),
                                                           high = np.array([np.inf, np.inf, np.inf]),
                                                           shape = (3,), dtype = np.float64)
                                       })

        self.action_space = Box(low = np.array([-1.0, -1.0, -1.0]),
                                high = np.array([1.0, 1.0, 1.0]),
                                shape = (3,), dtype = np.float64)

        pass

    def get_default_observation(self):
        """
        Simple implementation returning the default observation which is a zero vector in the shape of the observation space.

        :return: reset observation vector
        :rtype: dict
        """
       pass

    def get_observations(self):
        """
        This get_observation implementation builds the required observation for the crane reaching problem
        All values are gathered here from the robot and the target objects. All values are normalized into the range [-1, 1].

        :return: observation, desired_goal, achieved_goal
        :rtype: dict
        """
        pass

    def render(self, mode = None):
        pass

    def _compute_reward_not_vectorized(self, achieved_goal, desired_goal):
        """
        This is an abstract method from Gymnasium:
        https://github.com/Farama-Foundation/Gymnasium-Robotics/blob/a35b1c1fa669428bf640a2c7101e66eb1627ac3a/gym_robotics/core.py#L8
        """
        pass

    def compute_reward(self, achieved_goal, desired_goal, info = {}):

        pass

    def compute_terminated(self, achieved_goal, desired_goal, info):
        """
        This is an abstract method from Gymnasium:
        https://github.com/Farama-Foundation/Gymnasium-Robotics/blob/a35b1c1fa669428bf640a2c7101e66eb1627ac3a/gym_robotics/core.py#L8
        An episode is done if the distance between "end effector" and "target" < 0.05

        :return: True if the termination condition is met, False otherwise
        :rtype: bool
        """
        pass

    def compute_truncated(self, achieved_goal, desired_goal, info):

        pass

    def get_info(self):

        pass

    def __call__(self):
        return self

def train(model_class, env, model_path, logdir_path, n_max):

    print("Checking the environment...")
    print(check_env(env))
    print("Environment is compatible with Gymnasium!")

    vec_env = make_vec_env(env)
    vec_env = VecNormalize(venv = vec_env, training = True, norm_obs = True, norm_reward = True)

    model = model_class(policy = 'MultiInputPolicy', 
                        env = vec_env, 
                        learning_rate = 3e-4,
                        n_steps = 2048,
                        batch_size = 256,
                        n_epochs = 10,
                        gamma = 0.98,
                        ent_coef = 0.2, 
                        vf_coef = 0.5,
                        max_grad_norm = 0.5,
                        tensorboard_log = logdir_path,
                        device = "cuda", 
                        policy_kwargs = dict(net_arch = [2048, 2048]),
                        verbose = 0)

    for i in range(n_max):

        model.learn(total_timesteps = 10_000, log_interval = 1, reset_num_timesteps = True, tb_log_name = "ppo", progress_bar = False)
        print(f"Saving the model...")

def main():

    env = CustomEnv()

     train(PPO, env, 'model_path', 'output', 10)

if __name__ == '__main__':
    main()

The question now is: if I do a print(type(vec_env)) the output is a DummyVecEnv

Is it correct or I'm doing something wrong?

Checklist

qgallouedec commented 1 month ago

Yes,

from stable_baselines3.common.env_util import make_vec_env

vec_env = make_vec_env(CustomEnv, n_envs=4)

is the way to go.

If you want to use SubprocVecEnv, use this instead:

from stable_baselines3.common.vec_env import SubprocVecEnv

vec_env = make_vec_env(CustomEnv, n_envs=4, vec_env_cls=SubprocVecEnv)

answer in the doc: Multiprocessing Unleashing the Power of Vectorized Environments

EDIT: pass the env class instead of the env.

AxKo commented 1 month ago

Well, there is a small problem in the above code since 'env' is not callable.

However, even following the tutorial it does not work for me and I traced the problem to 'isinstance(env, gymnasium.Env)' in the function _patch_env. Although my CustomEnv inherits from gymnasium.Env the isinstance call returns False. If I look (with the debugger at that code location) at the inheritance structure with 'for cls in env.class.mro: print(cls)' I see that the inheritance information is gone: <class 'main.CustomEnv'> <class 'typing.Generic'> <class 'object'>

However, if I do the same in my main code it looks as expected: <class 'main.CustomGridEnv'> <class 'gymnasium.core.Env'> <class 'typing.Generic'> <class 'object'>

I do this under Windows using stable-baseline3 v2.3.2 (which uses 'spawn' as start_method). Has anyone an idea what I'm doing wrong?

Thanx

wilhem commented 1 month ago

Wait, it is callable because if you look in the class definition I implemented the call method:

    def __call__(self):
        return self
qgallouedec commented 1 month ago

I'm not sure to understand your issue.

code since 'env' is not callable.

Why is it a problem?

AxKo commented 1 month ago

Wait, it is callable because if you look in the class definition I implemented the call method:

Indeed, you are right. I didn't spot that in your code :-(

But does the following line of code work for you ?
And are you doing this under Windows? I always get the error "The environment is of type <class 'main.CustomEnv'>, not a Gymnasium environment"

vec_env = make_vec_env(env, n_envs=4, vec_env_cls=SubprocVecEnv)

qgallouedec commented 1 month ago

With a full example it may be clearer (above comment edited):

import gymnasium as gym
from gymnasium import spaces

from stable_baselines3.common.env_util import make_vec_env
from stable_baselines3.common.vec_env import SubprocVecEnv

class CustomEnv(gym.Env):
    """Custom Environment that follows gym interface."""
    def __init__(self):
        super().__init__()
        self.action_space = spaces.Discrete(2)
        self.observation_space = spaces.Box(low=-1, high=1, shape=(2,))

    def step(self, action):
        return self.observation_space.sample(), 0.0, False, False, {}

    def reset(self, seed=None, options=None):
        return self.observation_space.sample(), {}

if __name__ == "__main__":
    vec_env = make_vec_env(CustomEnv, n_envs=4, vec_env_cls=SubprocVecEnv)
AxKo commented 1 month ago

When I run your code I get the following error:

Process SpawnProcess-1: Traceback (most recent call last): File "C:\Entwicklung\Anaconda3\lib\multiprocessing\process.py", line 314, in _bootstrap self.run() File "C:\Entwicklung\Anaconda3\lib\multiprocessing\process.py", line 108, in run self._target(*self._args, **self._kwargs) File "C:\Entwicklung\Anaconda3\lib\site-packages\stable_baselines3\common\vec_env\subproc_vec_env.py", line 29, in _worker env = _patch_env(env_fn_wrapper.var()) File "C:\Entwicklung\Anaconda3\lib\site-packages\stable_baselines3\common\env_util.py", line 100, in _init env = _patch_env(env) File "C:\Entwicklung\Anaconda3\lib\site-packages\stable_baselines3\common\vec_env\patch_gym.py", line 33, in _patch_env raise ValueError( ValueError: The environment is of type <class 'main.CustomEnv'>, not a Gymnasium environment. In this case, we expect OpenAI Gym to be installed and the environment to be an OpenAI Gym environment. Traceback (most recent call last): File "C:\Entwicklung\Anaconda3\lib\multiprocessing\connection.py", line 312, in _recv_bytes nread, err = ov.GetOverlappedResult(True) BrokenPipeError: [WinError 109] The pipe has been ended

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:\Entwicklung\Anaconda3\lib\multiprocessing\connection.py", line 321, in _recv_bytes raise EOFError EOFError

Process finished with exit code 1

qgallouedec commented 1 month ago

Please provide your system info (see bug issue template)

AxKo commented 1 month ago

sb3.get_system_info() gives:

wilhem commented 1 month ago

With a full example it may be clearer (above comment edited):

import gymnasium as gym
from gymnasium import spaces

from stable_baselines3.common.env_util import make_vec_env
from stable_baselines3.common.vec_env import SubprocVecEnv

class CustomEnv(gym.Env):
    """Custom Environment that follows gym interface."""
    def __init__(self):
        super().__init__()
        self.action_space = spaces.Discrete(2)
        self.observation_space = spaces.Box(low=-1, high=1, shape=(2,))

    def step(self, action):
        return self.observation_space.sample(), 0.0, False, False, {}

    def reset(self, seed=None, options=None):
        return self.observation_space.sample(), {}

if __name__ == "__main__":
    vec_env = make_vec_env(CustomEnv, n_envs=4, vec_env_cls=SubprocVecEnv)

This example does not work. You need a call method in the class, otherwise I get this error:

Traceback (most recent call last):
  File "/home/ubuntu/workspace/src/simulator/webots/controllers/supervisor_controller/supervisor_controller.py", line 470, in <module>
    main()
  File "/home/ubuntu/workspace/src/simulator/webots/controllers/supervisor_controller/supervisor_controller.py", line 464, in main
    train(model_class, env, model_path, logdir_path, n_max)
  File "/home/ubuntu/workspace/src/simulator/webots/controllers/supervisor_controller/supervisor_controller.py", line 368, in train
    vec_env = make_vec_env(env, n_envs = 4, vec_env_cls = SubprocVecEnv)
  File "/opt/anaconda/anaconda3/envs/dl-gym/lib/python3.10/site-packages/stable_baselines3/common/env_util.py", line 125, in make_vec_env
    vec_env = vec_env_cls([make_env(i + start_index) for i in range(n_envs)], **vec_env_kwargs)
  File "/opt/anaconda/anaconda3/envs/dl-gym/lib/python3.10/site-packages/stable_baselines3/common/vec_env/subproc_vec_env.py", line 114, in __init__
    process.start()
  File "/opt/anaconda/anaconda3/envs/dl-gym/lib/python3.10/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
  File "/opt/anaconda/anaconda3/envs/dl-gym/lib/python3.10/multiprocessing/context.py", line 300, in _Popen
    return Popen(process_obj)
  File "/opt/anaconda/anaconda3/envs/dl-gym/lib/python3.10/multiprocessing/popen_forkserver.py", line 35, in __init__
    super().__init__(process_obj)
  File "/opt/anaconda/anaconda3/envs/dl-gym/lib/python3.10/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/opt/anaconda/anaconda3/envs/dl-gym/lib/python3.10/multiprocessing/popen_forkserver.py", line 47, in _launch
    reduction.dump(process_obj, buf)
  File "/opt/anaconda/anaconda3/envs/dl-gym/lib/python3.10/multiprocessing/reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
  File "/opt/anaconda/anaconda3/envs/dl-gym/lib/python3.10/site-packages/stable_baselines3/common/vec_env/base_vec_env.py", line 479, in __getstate__
    return cloudpickle.dumps(self.var)
  File "/opt/anaconda/anaconda3/envs/dl-gym/lib/python3.10/site-packages/cloudpickle/cloudpickle.py", line 1479, in dumps
    cp.dump(obj)
  File "/opt/anaconda/anaconda3/envs/dl-gym/lib/python3.10/site-packages/cloudpickle/cloudpickle.py", line 1245, in dump
    return super().dump(obj)
ValueError: ctypes objects containing pointers cannot be pickled
qgallouedec commented 1 month ago

You need a call method in the class, otherwise I get this error:

No we don't. gym env aren't meant to be callable

qgallouedec commented 1 month ago

When I run your code I get the following error:

Can you double-check that you've actually ran the example that I gave?

What the following outputs?:

import gymnasium as gym
from gymnasium import spaces

from stable_baselines3.common.env_util import make_vec_env
from stable_baselines3.common.vec_env import SubprocVecEnv

class CustomEnv(gym.Env):
    """Custom Environment that follows gym interface."""
    def __init__(self):
        super().__init__()
        self.action_space = spaces.Discrete(2)
        self.observation_space = spaces.Box(low=-1, high=1, shape=(2,))

    def step(self, action):
        return self.observation_space.sample(), 0.0, False, False, {}

    def reset(self, seed=None, options=None):
        return self.observation_space.sample(), {}

if __name__ == "__main__":
    env = CustomEnv()
    print(isinstance(env, gym.Env))
AxKo commented 1 month ago

Can you double-check that you've actually ran the example that I gave?

That was surely your example! And now "print(isinstance(env, gym.Env))" gives True, as it should be.

However, as I mentioned in my first post "isinstance(env, gym.Env)" inside _patch_env() is then False. Your code behaves exactly as my own code.

qgallouedec commented 1 month ago

To summarize, this is what you're encountering?:

import gymnasium as gym
from gymnasium import spaces

from stable_baselines3.common.env_util import _patch_env

class CustomEnv(gym.Env):
    def __init__(self):
        super().__init__()
        self.action_space = spaces.Discrete(2)
        self.observation_space = spaces.Box(low=-1, high=1, shape=(2,))

    def step(self, action):
        return self.observation_space.sample(), 0.0, False, False, {}

    def reset(self, seed=None, options=None):
        return self.observation_space.sample(), {}

if __name__ == "__main__":
    env = CustomEnv()
    print(isinstance(env, gym.Env))  # True
    _patch_env(env)  # Fails The environment is of type <class 'main.CustomEnv'>, not a Gymnasium ...
qgallouedec commented 1 month ago

Do you also have the issue with DummyVecEnv? (default vec_env_cls)

make_vec_env(CustomEnv, n_envs=4)
AxKo commented 1 month ago

The direct call to _patch_env(env) does NOT give an error, only when it happens inside make_vec_env(). I guess it has something to do with the fact that it is then called from a spawned process!?

make_vec_env() with DummyVecEnv works fine. And as I understand that also does NOT involve creating a new process.

P.S. I have to go now. Will check back tomorrow.

wilhem commented 1 month ago

From my side:

without the __call__ method, no matter what I do, I get the following error: TypeError: 'CustomEnv' object is not callable

By the way the function:

env = CustomEnv()
print(isinstance(env, gym.Env))

returns True

qgallouedec commented 1 month ago

@wilhem @AxKo It's getting pretty complicated to figure out who's getting what error. So to sum up, the problem is that under Windows, you get:

from gymnasium import spaces
import gymnasium as gym
from stable_baselines3.common.env_util import _patch_env, make_vec_env, SubprocVecEnv

class CustomEnv(gym.Env):
    def __init__(self):
        super().__init__()
        self.action_space = spaces.Discrete(2)
        self.observation_space = spaces.Box(low=-1, high=1, shape=(2,))

    def step(self, action):
        return self.observation_space.sample(), 0.0, False, False, {}

    def reset(self, seed=None, options=None):
        return self.observation_space.sample(), {}

if __name__ == "__main__":
    print(isinstance(CustomEnv(), gym.Env))  # True
    _patch_env(CustomEnv())  # OK
    make_vec_env(CustomEnv, n_envs=4)  # OK
    make_vec_env(CustomEnv, n_envs=4, vec_env_cls=SubprocVecEnv)  # Fails

Traceback:

ValueError: The environment is of type <class 'main.CustomEnv'>, not a Gymnasium environment. In this case, we expect 

System info:

OS: Windows-10-10.0.22631-SP0 10.0.22631 Python: 3.10.9 Stable-Baselines3: 2.3.2 PyTorch: 2.3.0+cpu GPU Enabled: False Numpy: 1.23.5 Cloudpickle: 2.0.0 Gymnasium: 0.28.1 OpenAI Gym: 0.23.1

Can you confirm?

From now on, if your comment doesn't relate to this issue precisely, please open a new issue.

wilhem commented 1 month ago

For some unknown reasons it works, if the environment is registered, like here:

from gymnasium.envs.registration import register
# Example for the CartPole environment
register(
    # unique identifier for the env `name-version`
    id="CartPole-v1",
    # path to the class for creating the env
    # Note: entry_point also accept a class as input (and not only a string)
    entry_point="gym.envs.classic_control:CartPoleEnv",
    # Max number of steps per episode, using a `TimeLimitWrapper`
    max_episode_steps=500,
)

Please try that and remove the __call__ method. Pay attention to link the class of the enviroment in the entry_point, e.g.:

from gymnasium.envs.registration import register

register(
    id="Robot-v1",
    entry_point=CustomEnv,  # <= Not a string, but the class name!
    max_episode_steps=500,
)

Source: link

About my system: I'm using Ubuntu 22.04

qgallouedec commented 1 month ago

So it's not Windows only but Ubuntu as well? I can't reproduce the error on Ubuntu.

AxKo commented 1 month ago

@qgallouedec: Yes, your above summary is correct for me.

In addition, using a registered using a registered environment also works for me: make_vec_env("CartPole-v1", n_envs=4, vec_env_cls=SubprocVecEnv) # ok

However, registration does NOT work for me: register(id='CustomEnv-v0', entry_point=CustomEnv) print(gym.envs.registry.keys()) => contains 'CustomEnv-v0' make_vec_env('CustomEnv-v0', n_envs=4, vec_env_cls=SubprocVecEnv) => Error: 'CustomEnv' doesn't exist

P.S. can you actually reproduce my problem under Windows??

wilhem commented 1 month ago

Stupid question: is the class CustomEnv in another file or in the same file, where you register the environment?

AxKo commented 1 month ago

The same file! I just added my 3 lines of code to the example code of Quentin.

wilhem commented 1 month ago

Can you post your code entirely removing the unnecessary parts?

wilhem commented 1 month ago

What happens if you call the class and not only the name of the class? register(id='CustomEnv-v0', entry_point=CustomEnv()) # <= CustomEnv() instead of CustomEnv

AxKo commented 1 month ago

It makes no difference if I use CustomEnv or CustomEnv(). Here the complete code:


import gymnasium as gym
from gymnasium import spaces

from stable_baselines3.common.env_util import make_vec_env
from stable_baselines3.common.vec_env import SubprocVecEnv
from stable_baselines3.common.env_util import _patch_env
from gymnasium.envs.registration import register

class CustomEnv(gym.Env):
    """Custom Environment that follows gym interface."""
    def __init__(self):
        super().__init__()
        self.action_space = spaces.Discrete(2)
        self.observation_space = spaces.Box(low=-1, high=1, shape=(2,))

    def step(self, action):
        return self.observation_space.sample(), 0.0, False, False, {}

    def reset(self, seed=None, options=None):
        return self.observation_space.sample(), {}

if __name__ == "__main__":
    print(isinstance(CustomEnv(), gym.Env))       # True
    _patch_env(CustomEnv())                        # ok
    make_vec_env(CustomEnv, n_envs=4)              # ok
    make_vec_env(CustomEnv, n_envs=4, vec_env_cls=SubprocVecEnv)  # fails
    make_vec_env("CartPole-v1", n_envs=4, vec_env_cls=SubprocVecEnv)  # ok

    register(id='CustomEnv-v0', entry_point=CustomEnv)
    print(gym.envs.registry.keys())     # => contains  'CustomEnv-v0'
    make_vec_env('CustomEnv-v0', n_envs=4, vec_env_cls=SubprocVecEnv) # Error: 'CustomEnv' doesn't exist
wilhem commented 1 month ago

Can you put the registration of the environment outside the main? Then please post the error.

import gymnasium as gym
from gymnasium import spaces

from stable_baselines3.common.env_util import make_vec_env
from stable_baselines3.common.vec_env import SubprocVecEnv
from stable_baselines3.common.env_util import _patch_env
from gymnasium.envs.registration import register

class CustomEnv(gym.Env):
    """Custom Environment that follows gym interface."""
    def __init__(self):
        super().__init__()
        self.action_space = spaces.Discrete(2)
        self.observation_space = spaces.Box(low=-1, high=1, shape=(2,))

    def step(self, action):
        return self.observation_space.sample(), 0.0, False, False, {}

    def reset(self, seed=None, options=None):
        return self.observation_space.sample(), {}

register(id='CustomEnv-v0', entry_point=CustomEnv)

if __name__ == "__main__":
    print(isinstance(CustomEnv(), gym.Env))       # True
    _patch_env(CustomEnv())                        # ok
    make_vec_env(CustomEnv, n_envs=4)              # ok
    make_vec_env(CustomEnv, n_envs=4, vec_env_cls=SubprocVecEnv)  # fails
    make_vec_env("CartPole-v1", n_envs=4, vec_env_cls=SubprocVecEnv)  # ok

    print(gym.envs.registry.keys())     # => contains  'CustomEnv-v0'
    make_vec_env('CustomEnv-v0', n_envs=4, vec_env_cls=SubprocVecEnv) # Error: 'CustomEnv' doesn't exist
AxKo commented 1 month ago

That WORKED !! I guess it is because now the registration is performed by each new process.

I only get this warning about render_mode (which is an unrelated and minor issue): UserWarning: WARN: The environment is being initialised with render_mode='rgb_array' that is not in the possible render_modes ([]).

Thanks !

wilhem commented 1 month ago

I don't know... but I think that SB3 is extremely picky compared to other frameworks

araffin commented 2 weeks ago

Closing as the original question has been answered here: https://github.com/DLR-RM/stable-baselines3/issues/1968#issuecomment-2232592563

XiaobenLi00 commented 2 weeks ago

That WORKED !! I guess it is because now the registration is performed by each new process.

I only get this warning about render_mode (which is an unrelated and minor issue): UserWarning: WARN: The environment is being initialised with render_mode='rgb_array' that is not in the possible render_modes ([]).

Thanks !

I also have problems about render_mode

UserWarning: WARN: The environment is being initialised with render_mode='rgb_array' that is not in the possible render_modes ([]).

Did you solve it?

araffin commented 2 weeks ago

this comes from Gymnasium not sb3, please have a look at their doc

wilhem commented 2 weeks ago

That WORKED !! I guess it is because now the registration is performed by each new process. I only get this warning about render_mode (which is an unrelated and minor issue): UserWarning: WARN: The environment is being initialised with render_mode='rgb_array' that is not in the possible render_modes ([]). Thanks !

I also have problems about render_mode

UserWarning: WARN: The environment is being initialised with render_mode='rgb_array' that is not in the possible render_modes ([]).

Did you solve it?

In your custom environment put the following line:

class CustomEnv(gym.Env):

     metadata = {"render_modes": ["human", "rgb_array"], "render_fps": 30}