Add Pettingzoo Bindings

vwxyzjn commented 2 years ago

TLDR: Petting Zoo has become the standard library for getting multi-agent environments & we want to support Petting Zoo's bindings in gym-microrts.

This project https://github.com/vwxyzjn/gym-microrts is an RL environment for RTS game, where lots of units are always spawning and dying. Because of the multi-agent nature of RTS games, gym-microrts should fit with PettingZoo’s interface pretty seamlessly.

We currently need help on the following fronts:

Support PettingZoo’s API in gym-microrts. The current API is already similar to PettingZoo’s API (having similar observation space, action space, and support for action masks) but it will be nice to officially adopt PettingZoo.
Making SB3 work with pettingzoo. Recently, we had a contribution that made SB3 work with gym-microrts by @kachayev: https://github.com/kachayev/gym-microrts-paper-sb3, it would be nice to have an SB3 demo that works with gym-microrts’s pettingzoo API and ultimately support all pettingzoo environments that have action masks such as Chess or Go.

Setting up an issue to track progress.

@BolunDai0216 suggests he would like to take a stab at this.

kachayev commented 2 years ago

I have a working version of microrts integrated with PettingZoo to expose each unit in the game as independent agent :) I assume you are describing less "extreme" API where we have agent = player, right?

Is the goal to make wrappers from PettingZoo to SB3 to work (like vectorization)?

vwxyzjn commented 2 years ago

Oh @kachayev that's awesome! @BolunDai0216 is interested in working on this. Would you mind sharing your version here?

kachayev commented 2 years ago

Absolutely! I'll dig it up tomorrow

kachayev commented 2 years ago

Okay, I completely blinked on this. This is (partially) the code I'm using in my experiments. I tried to cherry-pick it without any dependencies on my implementation of the environment. I think that the use case of having API for 2 players would be much easier: there won't be any problems with having dynamic number of agents, obs and action space is the same for both players, no problems with rewards/infos, etc. It will be just a little bit of index juggling when putting obs and actions in place. For having each unit as a separate agent, as you see here, is more involved. And I certainly don't have fully fledged solution that would cover most use cases (this one is tight to my specific algo only). Also, note that this AEC API. Not sure if the goal here to have only AEC, or other APIs as well. Support for parallel_env would be cool as well.

kachayev commented 2 years ago

from pettingzoo import AECEnv
from pettingzoo.utils import agent_selector

class MicroRTSAEC(AECEnv, MicroRTSGridModeSharedMemVecEnv):

    def __init__(
        self,
        opponent,
        agent_vision_patch=(5,5),
        partial_obs=False,
        max_steps=2000,
        render_theme=2,
        frame_skip=0,
        map_path="maps/10x10/basesTwoWorkers10x10.xml",
        reward_weight=np.array([0.0, 1.0, 0.0, 0.0, 0.0, 5.0]),
    ):
        self.agent_vision_patch = agent_vision_patch
        super(MicroRTSGridModeSharedMemVecEnv, self).__init__(
            0,
            1,
            partial_obs,
            max_steps,
            render_theme,
            frame_skip,
            [opponent],
            [map_paths],
            reward_weight,
        )
        self._agent_selector = agent_selector([]) # empty before we start
        self.agent_selection = None
        self.agent_observation_space = gym.spaces.Box(
            low=0.0,
            high=1.0,
            shape=(self.agent_vision_patch[0], self.agent_vision_patch[1], sum(self.num_planes)),
            dtype=np.int32
        )
        self.agent_action_space = gym.spaces.MultiDiscrete(np.array(self.action_space_dims))
        self._reset_actions = np.zeros_like(self.actions)

    def observation_space(self, agent):
        """All agents have the same obs space."""
        return self.agent_observation_space

    def action_space(self, agent):
        """All agents have the same action space."""
        return self.agent_action_space

    def reset(self):
        """Note that we don't return obs here as we do with Gym."""
        super(MicroRTSGridModeSharedMemVecEnv, self).reset()
        np.copyto(self.actions, self._reset_actions)
        all_agents = self.agents
        self._agent_selector.reinit(all_agents)
        self.agent_selection = self._agent_selector.next()
        self.infos = {agent:{} for agent in all_agents}
        self.dones = {agent:False for agent in all_agents}
        self._cumulative_rewards = {agent:0. for agent in all_agents}

    def step(self, action):
        agent = self.agent_selection
        # fill in action for a given agent
        np.copyto(self.actions[0][agent], action)
        if self._agent_selector.is_last():
            all_agents = self.agents
            obs, rewards, dones, infos = self.step_wait()
            self.infos = {agent:infos[0].copy() for agent in all_agents}
            self.dones = {agent:dones[0].copy() for agent in all_agents}
            self._cumulative_rewards = {agent:rewards[0]/len(all_agents) for agent in all_agents}
            # reset actions now, as we already used them in the environment
            np.copyto(self.actions, self._reset_actions)

    def observe(self, agent):
        return self.obs[0][agent]

    @property
    def max_num_agents(self):
        return self.height * self.width    

    @property
    def game_state(self):
        return self.vec_client[0].gs

    @property
    def agents(self):
        return [u.getPosition() for u in game_state.getUnits()]

BolunDai0216 commented 2 years ago

Thanks for sharing, this definitely gives me a nice place to start.

vwxyzjn commented 2 years ago

@kachayev thanks for sharing this!

I think that the use case of having API for 2 players would be much easier

I agree. My first thought on this is gym-microrts's pettingzoo API should be very similar to chess's pettingzoo API that only has two players: https://www.pettingzoo.ml/classic/chess

kachayev commented 2 years ago

@BolunDai0216 Absolutely!

@vwxyzjn if my memory doesn't fail me, chess is also implemented as AEC. So the API would look the same. I meant the implementation would be easier with static number of agents

Farama-Foundation / MicroRTS-Py

Add Pettingzoo Bindings #59