GymWrapper action space creation bug

aadharna commented 2 years ago

Found some breaking behaviour in GymWrapper::_create_action_space().

https://github.com/Bam4d/Griddly/blob/3ea10425d1981d2bbc07ba6d3efa02d199fd3739/python/griddly/GymWrapper.py#L417-L419

If self.action_space is e.g. Discrete(2) while action_space is [Discrete(2), Discrete(2), ...], then the above line will break.

Usually, this is masked where the multiple agents have a MultiDiscrete action space and so the zip will unzip self.action_space of MultiDiscrete([2, 5]) into [Discrete(2), Discrete(5)] and iterate on that so that [MultiDiscrete([2, 5]), MultiDiscrete([2, 5])] for action_space will behave well.

I fixed it by adding:

if isinstance(self.action_space, gym.spaces.Discrete):
    self.action_space = [self.action_space]

between lines 417 and 418.

This occurred when running using a Multi-agent Env where each action has a Discrete action space.


import os
import griddly
from pprint import pprint
from griddly import gd
from griddly.util.rllib.environment.core import RLlibEnv, RLlibMultiAgentWrapper

if __name__ == "__main__":

    env_config = {
        'environment_name': 'sensor',
        "yaml_file": 'sensor_game_multi_agent.yaml',
        # 'yaml_file': "D:\miniconda\envs\sensor\Lib\site-packages\griddly\\resources\games\Multi-Agent\\foragers.yaml",
        "global_observer_type": gd.ObserverType.ISOMETRIC,
        'player_observer_type': gd.ObserverType.BLOCK_2D,
        'max_steps': 250,
        'level': 0,
    }

    env = RLlibEnv(env_config)
    env.enable_history(True)
    # pprint(env.unwrapped.action_input_mappings)
    # pprint(env.action_space)
    # env.on_episode_start(0, 0)
    env = RLlibMultiAgentWrapper(env, env_config)
    env.enable_history(True)

aadharna commented 2 years ago

Here's the yaml again:

Version: "0.1"
Environment:
  Name: Particle Sensor Game
  Description: Multiple bang-bang sensors track a particle
  Observers:
    Block2D:
      TileSize: 24
    Isometric:
      TileSize: [ 32, 48 ]
      IsoTileHeight: 16
      IsoTileDepth: 4
      BackgroundTile: oryx/oryx_iso_dungeon/grass-1.png
    Vector:
      IncludePlayerId: true
  Player:
    AvatarObject: gnome
    Count: 4
    Observer:
      TrackAvatar: true
      Height: 5
      Width: 5
      OffsetX: 0
      OffsetY: 0
  Levels:
    - |
      s  .  .   .  .  .  .  .   .  .  .  .  .   .  .  .  .  .   .  .
      .  .  .   .  .  .  .  .   .  .  .  .  .   .  .  .  .  .   .  . 
      .  .  g1  .  .  .  .  g2  .  .  .  .  g3  .  .  .  .  g4  .  . 
      .  .  .   .  .  .  .  .   .  .  .  .  .   .  .  .  .  .   .  . 
      .  .  .   .  .  .  .  .   .  .  .  .  .   .  .  .  .  .   .  .

  Termination:
    Win:
      - eq: [spider:count, 0]

Actions:

  - Name: spider_random_movement
    InputMapping:
      Internal: true
    Behaviours:
      - Src:
          Object: spider
          Commands:
            - mov: _dest
            - exec:
                Action: spider_random_movement
                Randomize: true
                Delay: 5
        Dst:
          Object: [_empty, right_exit, gnome]
      - Src:
          Object: spider
          Commands:
            - exec:
                Action: spider_random_movement
                Randomize: true
                Delay: 5
        Dst:
          Object: [spider, water, _boundary, gnome]
      - Src:
          Object: spider
          Commands:
           - remove: true
           - reward: 1
        Dst:
          Object: right_exit

#  - Name: move
#    Behaviours:
#      - Src:
#          Object: gnome
#          Commands:
#            - mov: _dest
#        Dst:
#          Object: _empty

  - Name: switch
    InputMapping:
      Inputs:
        1:
          Description: flip switch
          VectorToDest: [ 0, 0 ]
      Relative: true
    Behaviours:
      # turn on spotlight
      - Src:
          Object: gnome
          Preconditions:
            - eq: [spotlight, 0]
          Commands:
            - set: [spotlight, 1]
            - set_tile: 1
            - print: ["turning ----- on ----- spotlight"]
        Dst:
          Object: gnome

      # turn off spotlight
      - Src:
          Object: gnome
          Preconditions:
            - eq: [spotlight, 1]
          Commands:
            - set: [spotlight, 0]
            - set_tile: 0
            - print: ["turning ===== off ===== spotlight"]
        Dst:
          Object: gnome

  - Name: count_nearby_spider
    Probability: 1.0
    Trigger:
      Type: RANGE_BOX_AREA
      Range: 2
    Behaviours:
      # If the spider is within 2 of the gnome and the gnome is on, give point
      - Src:
          Object: gnome
          Preconditions:
            - eq: [spotlight, 1]
          Commands:
            - set: [spider_counter, 1]
            - print: ["spooders nearby: ", spider_counter]
        Dst:
          Object: spider

  - Name: give_feedback
    InputMapping:
      Inputs:
        '1':
          Description: provide feedback to the agent(s)
          VectorToDest:
            - 0
            - 0
      Internal: true
    Behaviours:
      - Src:
          Object: gnome
          Preconditions:
            - eq:
                - spotlight
                - 1
          Commands:
            - if:
                Conditions:
                  eq:
                    - spider_counter
                    - 1
                OnTrue:
                  - reward: 2
                  - print:
                      - yay
                OnFalse:
                  - reward: -1
                  - print:
                      - boo
            - set:
                - spider_counter
                - 0
            - print:
                - 'spider_counter after feedback: '
                - spider_counter
                - spotlight
            - exec:
                Action: give_feedback
                ActionId: 1
                Delay: 1
        Dst:
          Object: gnome
      - Src:
          Object: gnome
          Preconditions:
            - eq:
                - spotlight
                - 0
          Commands:
            - if:
                Conditions:
                  eq:
                    - spider_counter
                    - 1
                OnTrue:
                  - reward: 2
                  - print:
                      - yay2
                OnFalse:
                  - reward: -1
                  - print:
                      - boo2
            - set:
                - spider_counter
                - 0
            - print:
                - 'spider_counter after feedback: '
                - spider_counter
                - spotlight
            - exec:
                Action: give_feedback
                ActionId: 1
                Delay: 1
        Dst:
          Object: gnome
Objects:
  - Name: gnome
    Z: 2
    MapCharacter: g
    InitialActions:
      - Action: give_feedback
        ActionId: 1
        Delay: 1
    Variables:
      - Name: spotlight
        InitialValue: 0
      - Name: spider_counter
        InitialValue: 0
    Observers:
      Isometric:
        - Image: oryx/oryx_iso_dungeon/avatars/gnome-1.png
        - Image: oryx/oryx_iso_dungeon/avatars/spider-fire-1.png
      Block2D:
        - Shape: square
          Color: [ 0.0, 0.8, 0.0 ]
          Scale: 0.5
        - Shape: triangle
          Color: [0.0, 0.5, 0.2]
          Scale: 0.8

  - Name: spider
    Z: 1
    InitialActions:
      - Action: spider_random_movement
        Randomize: true
    MapCharacter: s
    Observers:
      Isometric:
        - Image: oryx/oryx_iso_dungeon/avatars/spider-1.png
      Block2D:
        - Shape: triangle
          Color: [ 0.2, 0.2, 0.9 ]
          Scale: 0.5

  - Name: water
    MapCharacter: w
    Observers:
      Isometric:
        - Image: oryx/oryx_iso_dungeon/water-1.png
          Offset: [0, 4]
          TilingMode: ISO_FLOOR
      Block2D:
        - Color: [ 0.0, 0.0, 0.8 ]
          Shape: square

  - Name: right_exit
    MapCharacter: e
    Observers:
      Isometric:
        - Image: oryx/oryx_iso_dungeon/water-1.png
          Offset: [0, 4]
          TilingMode: ISO_FLOOR
      Block2D:
        - Color: [ 0.0, 0.0, 0.8 ]
          Shape: square

aadharna commented 2 years ago

Speaking of gym.space strangeness in RLlibEnv and RLlibMultiAgentWrapper.

In RLlibEnv, why do we use only a single instance of the action space list if there are multiple players? https://github.com/Bam4d/Griddly/blob/3ea10425d1981d2bbc07ba6d3efa02d199fd3739/python/griddly/util/rllib/environment/core.py#L131-L139

By doing so, this makes the action space in the RLlibMultiAgentWrapper incorrect. Where it should be self.action_space = gym.spaces.Dict({a:self.single_action_space for a in self._active_agents}) it is instead just e.g., Discrete(x).

So, in RLlibMultiAgentWrapper, I added:

def __init__(...):
    ...
    self.is_reset = False
    self.reset()

and

def reset():
    obs = super().reset(**kwargs)
    if not self.is_reset:
        self.is_reset = True
        self.single_action_space = self.action_space
        self.single_observation_space = self.observation_space
    self._active_agents.update([a + 1 for a in range(self.player_count)])
    self.action_space = gym.spaces.Dict({a:self.single_action_space for a in self._active_agents})
    self.observation_space = gym.spaces.Dict({a:self.single_observation_space for a in self._active_agents})
    return self._to_multi_agent_map(obs)

Bam4d / Griddly

GymWrapper action space creation bug #198