Open aadharna opened 2 years ago
Here's the yaml again:
Version: "0.1"
Environment:
Name: Particle Sensor Game
Description: Multiple bang-bang sensors track a particle
Observers:
Block2D:
TileSize: 24
Isometric:
TileSize: [ 32, 48 ]
IsoTileHeight: 16
IsoTileDepth: 4
BackgroundTile: oryx/oryx_iso_dungeon/grass-1.png
Vector:
IncludePlayerId: true
Player:
AvatarObject: gnome
Count: 4
Observer:
TrackAvatar: true
Height: 5
Width: 5
OffsetX: 0
OffsetY: 0
Levels:
- |
s . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . g1 . . . . g2 . . . . g3 . . . . g4 . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Termination:
Win:
- eq: [spider:count, 0]
Actions:
- Name: spider_random_movement
InputMapping:
Internal: true
Behaviours:
- Src:
Object: spider
Commands:
- mov: _dest
- exec:
Action: spider_random_movement
Randomize: true
Delay: 5
Dst:
Object: [_empty, right_exit, gnome]
- Src:
Object: spider
Commands:
- exec:
Action: spider_random_movement
Randomize: true
Delay: 5
Dst:
Object: [spider, water, _boundary, gnome]
- Src:
Object: spider
Commands:
- remove: true
- reward: 1
Dst:
Object: right_exit
# - Name: move
# Behaviours:
# - Src:
# Object: gnome
# Commands:
# - mov: _dest
# Dst:
# Object: _empty
- Name: switch
InputMapping:
Inputs:
1:
Description: flip switch
VectorToDest: [ 0, 0 ]
Relative: true
Behaviours:
# turn on spotlight
- Src:
Object: gnome
Preconditions:
- eq: [spotlight, 0]
Commands:
- set: [spotlight, 1]
- set_tile: 1
- print: ["turning ----- on ----- spotlight"]
Dst:
Object: gnome
# turn off spotlight
- Src:
Object: gnome
Preconditions:
- eq: [spotlight, 1]
Commands:
- set: [spotlight, 0]
- set_tile: 0
- print: ["turning ===== off ===== spotlight"]
Dst:
Object: gnome
- Name: count_nearby_spider
Probability: 1.0
Trigger:
Type: RANGE_BOX_AREA
Range: 2
Behaviours:
# If the spider is within 2 of the gnome and the gnome is on, give point
- Src:
Object: gnome
Preconditions:
- eq: [spotlight, 1]
Commands:
- set: [spider_counter, 1]
- print: ["spooders nearby: ", spider_counter]
Dst:
Object: spider
- Name: give_feedback
InputMapping:
Inputs:
'1':
Description: provide feedback to the agent(s)
VectorToDest:
- 0
- 0
Internal: true
Behaviours:
- Src:
Object: gnome
Preconditions:
- eq:
- spotlight
- 1
Commands:
- if:
Conditions:
eq:
- spider_counter
- 1
OnTrue:
- reward: 2
- print:
- yay
OnFalse:
- reward: -1
- print:
- boo
- set:
- spider_counter
- 0
- print:
- 'spider_counter after feedback: '
- spider_counter
- spotlight
- exec:
Action: give_feedback
ActionId: 1
Delay: 1
Dst:
Object: gnome
- Src:
Object: gnome
Preconditions:
- eq:
- spotlight
- 0
Commands:
- if:
Conditions:
eq:
- spider_counter
- 1
OnTrue:
- reward: 2
- print:
- yay2
OnFalse:
- reward: -1
- print:
- boo2
- set:
- spider_counter
- 0
- print:
- 'spider_counter after feedback: '
- spider_counter
- spotlight
- exec:
Action: give_feedback
ActionId: 1
Delay: 1
Dst:
Object: gnome
Objects:
- Name: gnome
Z: 2
MapCharacter: g
InitialActions:
- Action: give_feedback
ActionId: 1
Delay: 1
Variables:
- Name: spotlight
InitialValue: 0
- Name: spider_counter
InitialValue: 0
Observers:
Isometric:
- Image: oryx/oryx_iso_dungeon/avatars/gnome-1.png
- Image: oryx/oryx_iso_dungeon/avatars/spider-fire-1.png
Block2D:
- Shape: square
Color: [ 0.0, 0.8, 0.0 ]
Scale: 0.5
- Shape: triangle
Color: [0.0, 0.5, 0.2]
Scale: 0.8
- Name: spider
Z: 1
InitialActions:
- Action: spider_random_movement
Randomize: true
MapCharacter: s
Observers:
Isometric:
- Image: oryx/oryx_iso_dungeon/avatars/spider-1.png
Block2D:
- Shape: triangle
Color: [ 0.2, 0.2, 0.9 ]
Scale: 0.5
- Name: water
MapCharacter: w
Observers:
Isometric:
- Image: oryx/oryx_iso_dungeon/water-1.png
Offset: [0, 4]
TilingMode: ISO_FLOOR
Block2D:
- Color: [ 0.0, 0.0, 0.8 ]
Shape: square
- Name: right_exit
MapCharacter: e
Observers:
Isometric:
- Image: oryx/oryx_iso_dungeon/water-1.png
Offset: [0, 4]
TilingMode: ISO_FLOOR
Block2D:
- Color: [ 0.0, 0.0, 0.8 ]
Shape: square
Speaking of gym.space strangeness in RLlibEnv
and RLlibMultiAgentWrapper
.
In RLlibEnv
, why do we use only a single instance of the action space list if there are multiple players?
https://github.com/Bam4d/Griddly/blob/3ea10425d1981d2bbc07ba6d3efa02d199fd3739/python/griddly/util/rllib/environment/core.py#L131-L139
By doing so, this makes the action space in the RLlibMultiAgentWrapper
incorrect. Where it should be
self.action_space = gym.spaces.Dict({a:self.single_action_space for a in self._active_agents})
it is instead just e.g., Discrete(x)
.
So, in RLlibMultiAgentWrapper
, I added:
def __init__(...):
...
self.is_reset = False
self.reset()
and
def reset():
obs = super().reset(**kwargs)
if not self.is_reset:
self.is_reset = True
self.single_action_space = self.action_space
self.single_observation_space = self.observation_space
self._active_agents.update([a + 1 for a in range(self.player_count)])
self.action_space = gym.spaces.Dict({a:self.single_action_space for a in self._active_agents})
self.observation_space = gym.spaces.Dict({a:self.single_observation_space for a in self._active_agents})
return self._to_multi_agent_map(obs)
Found some breaking behaviour in GymWrapper::_create_action_space().
https://github.com/Bam4d/Griddly/blob/3ea10425d1981d2bbc07ba6d3efa02d199fd3739/python/griddly/GymWrapper.py#L417-L419
If
self.action_space
is e.g. Discrete(2) whileaction_space
is [Discrete(2), Discrete(2), ...], then the above line will break.Usually, this is masked where the multiple agents have a MultiDiscrete action space and so the zip will unzip
self.action_space
of MultiDiscrete([2, 5]) into [Discrete(2), Discrete(5)] and iterate on that so that [MultiDiscrete([2, 5]), MultiDiscrete([2, 5])] foraction_space
will behave well.I fixed it by adding:
between lines 417 and 418.
This occurred when running using a Multi-agent Env where each action has a Discrete action space.