DLR-RM / stable-baselines3

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.
https://stable-baselines3.readthedocs.io
MIT License
9.05k stars 1.7k forks source link

Parallel Multi Agent Environment observation_space bug #910

Closed domist07 closed 2 years ago

domist07 commented 2 years ago

🤖 Custom Gym Environment

 Describe the bug

I try to create a custom multiagent parallel envirnoment. As implemented in the parallel PettingZoo envirnoments I want to use dictionarys for my agents. But stable_baselines3 throws observation space errors. I also try and check the pettingzoo.butterfly.pistonball_v6.parallel_env – it is not working too. Therefore I create a minimal environment to reproduce the error. Did I overlooked something or is it a bug?

 Code example

import numpy as np

import gym
from gym import spaces

from stable_baselines3 import A2C
from stable_baselines3.common.env_checker import check_env

from pettingzoo import ParallelEnv

class MinimalParallel(ParallelEnv):
    """Minimal parallel environment"""
    metadata = {'render.modes': ['human']}

    def __init__(self, agent_n):
        super(MinimalParallel, self).__init__()
        self.agent_n = agent_n
        self.agents = list(range(self.agent_n))
        # Define action and observation space
        self.action_space = dict(
            zip(self.agents, [spaces.Discrete(2)] * self.agent_n))
        self.observation_space = dict(
            zip(self.agents, [spaces.Discrete(2)] * self.agent_n))
        self.observation = dict(
            zip(self.agents, np.random.randint(0, 2, self.agent_n)))
        self.info = dict(zip(self.agents, {}))
        pass

    def step(self, action):
        self.reward = dict(
            zip(self.agents, np.random.randint(0, 2, self.agent_n)))
        self.done = dict(zip(self.agents, bool(
            np.random.randint(0, 2, self.agent_n))))
        return self.observation, self.reward, self.done, self.info

    def reset(self):
        self.__init__(self.agent_n)
        return self.observation  # reward, done, info can't be included

    def render(self, mode='human'):
        pass

    def close(self):
        pass

env = MinimalParallel(2)
check_env(env, warn=True, skip_render_check=True)

model = A2C("MlpPolicy", env, verbose=1).learn(1000)
Traceback (most recent call last):
  File "c:\Users\PythonProjects\minimal_env\minimal_env\parallel.py", line 47, in <module>
    check_env(env, warn=True, skip_render_check=True)
  File "C:\Users\Miniconda3\envs\stblbl3\lib\site-packages\stable_baselines3\common\env_checker.py", line 250, in check_env
    _check_spaces(env)
  File "C:\Users\Miniconda3\envs\stblbl3\lib\site-packages\stable_baselines3\common\env_checker.py", line 195, in _check_spaces
    assert isinstance(env.observation_space, spaces.Space), "The observation space must inherit from gym.spaces" + gym_spaces
AssertionError: The observation space must inherit from gym.spaces cf https://github.com/openai/gym/blob/master/gym/spaces/

 System Info

Describe the characteristic of your environment:

 Checklist

Miffyli commented 2 years ago

As it says in the exception, the observation/action spaces should be gym.spaces objects (in this case, gym.spaces.Dict). See Gym docs on creating environments.

Do note that SB3 was not designed for multi-agent environments, and if you get bugs with this, we are unfortunately unable to provide support. For that you should see PettingZoo's docs and forums.