Farama-Foundation / Minigrid

Simple and easily configurable grid world environments for reinforcement learning
https://minigrid.farama.org/
Other
2.09k stars 604 forks source link

[Question] Is SymbolicObsWrapper wrong in Dist Shift? #327

Closed liruiluo closed 1 year ago

liruiluo commented 1 year ago

Question

When I use SymbolicObsWrapper in MiniGrid-DistShift1-v0, the returned symbols is:

{'image': array([[[ 0, 0, 2], [ 0, 1, 2], [ 0, 2, 2], [ 0, 3, 2], [ 0, 4, 2], [ 0, 5, 2], [ 0, 6, 2]],

   [[ 1,  0,  2],
    [ 1,  1,  2],
    [ 1,  2,  2],
    [ 1,  3, -1],
    [ 1,  4, -1],
    [ 1,  5,  9],
    [ 1,  6,  9]],

   [[ 2,  0,  9],
    [ 2,  1, -1],
    [ 2,  2,  8],
    [ 2,  3,  2],
    [ 2,  4,  2],
    [ 2,  5, -1],
    [ 2,  6, -1]],

   [[ 3,  0,  9],
    [ 3,  1,  9],
    [ 3,  2,  9],
    [ 3,  3, -1],
    [ 3,  4, -1],
    [ 3,  5,  2],
    [ 3,  6,  2]],

   [[ 4,  0, -1],
    [ 4,  1, -1],
    [ 4,  2, -1],
    [ 4,  3, -1],
    [ 4,  4, -1],
    [ 4,  5, -1],
    [ 4,  6, -1]],

   [[ 5,  0,  2],
    [ 5,  1,  2],
    [ 5,  2, -1],
    [ 5,  3, -1],
    [ 5,  4, -1],
    [ 5,  5, -1],
    [ 5,  6, -1]],

   [[ 6,  0, -1],
    [ 6,  1, -1],
    [ 6,  2,  2],
    [ 6,  3,  2],
    [ 6,  4, -1],
    [ 6,  5, -1],
    [ 6,  6, -1]],

   [[ 7,  0, -1],
    [ 7,  1, -1],
    [ 7,  2, -1],
    [ 7,  3, -1],
    [ 7,  4,  2],
    [ 7,  5,  2],
    [ 7,  6,  2]],

   [[ 8,  0,  2],
    [ 8,  1,  2],
    [ 8,  2,  2],
    [ 8,  3,  2],
    [ 8,  4,  2],
    [ 8,  5,  2],
    [ 8,  6,  2]]]), 'direction': 3, 'mission': 'get to the green goal square'}

However, the position of lava and the goal is wrong. Why this happened?

pseudo-rnd-thoughts commented 1 year ago

Could you provide some code that gets the actual coordinates of the lava and goal with the symbolic coordinates of them as well just to see the difference and confirm the bug

liruiluo commented 1 year ago

Here is the code:

import gym_minigrid
import gym
import matplotlib.pyplot as plt
from gym_minigrid.wrappers import SymbolicObsWrapper,,PositionObsWrapper

env_name = "MiniGrid-DistShift1-v0"

env = gym.make(env_name)

#env = SymbolicObsWrapper(env)
env = PositionObsWrapper(env)

obs = env.reset()
obs, r, done, _ = env.step(0)
obs = obs
print(obs)

and the result is : [1. 1. 2. 7. 6.75] The actual coordinate of the goal is (2,7). Here is the PositionObsWrapper: `class PositionObsWrapper(gym.core.ObservationWrapper):

def __init__(self, env,type='slope'):
    super().__init__(env)
    self.goal_position = None
    self.type = type
    self.observation_space = spaces.Box(
        low=0,
        high=1,
        shape=(5,),
        dtype='float32'
    )

def reset(self):
    obs = self.env.reset()
    if not self.goal_position:
        self.goal_position = [x for x,y in enumerate(self.grid.grid) if isinstance(y,(Goal) ) ]
        if len(self.goal_position) >= 1: # in case there are multiple goals , needs to be handled for other env types
            self.goal_position = (int(self.goal_position[0]/self.height) , self.goal_position[0]%self.width) #%
    obs = np.array([self.agent_pos[0],self.agent_pos[1],self.goal_position[0],self.goal_position[1],obs['direction']*self.width/4])
    return obs

def observation(self, obs):
    obs = np.array([self.agent_pos[0],self.agent_pos[1],self.goal_position[0],self.goal_position[1],obs['direction']*self.width/4])
    #obs['goal_direction'] = np.arctan( slope ) if self.type == 'angle' else slope
    return obs`

And the image: image

BolunDai0216 commented 1 year ago

There was an issue with how the objects are mapped to the grid, which is now fixed in PR #331. Now after I run the code

import gymnasium as gym
import numpy as np
import minigrid
from minigrid.core.constants import OBJECT_TO_IDX

print(f"{OBJECT_TO_IDX=}")

env = gym.make("MiniGrid-DistShift1-v0", render_mode="rgb_array")
sym_env = minigrid.wrappers.SymbolicObsWrapper(env)
obs, info = sym_env.reset(seed=123)

grid = obs["image"]
print(grid[np.where(grid[:, :, 2] == OBJECT_TO_IDX["goal"])])
print(grid)

I get the output

OBJECT_TO_IDX={'unseen': 0, 'empty': 1, 'wall': 2, 'floor': 3, 'door': 4, 'key': 5, 'ball': 6, 'box': 7, 'goal': 8, 'lava': 9, 'agent': 10}
[[7 1 8]]
[[[ 0  0  2]
  [ 0  1  2]
  [ 0  2  2]
  [ 0  3  2]
  [ 0  4  2]
  [ 0  5  2]
  [ 0  6  2]]

 [[ 1  0  2]
  [ 1  1 10]
  [ 1  2 -1]
  [ 1  3 -1]
  [ 1  4 -1]
  [ 1  5 -1]
  [ 1  6  2]]

 [[ 2  0  2]
  [ 2  1 -1]
  [ 2  2 -1]
  [ 2  3 -1]
  [ 2  4 -1]
  [ 2  5 -1]
  [ 2  6  2]]

 [[ 3  0  2]
  [ 3  1  9]
  [ 3  2  9]
  [ 3  3 -1]
  [ 3  4 -1]
  [ 3  5 -1]
  [ 3  6  2]]

 [[ 4  0  2]
  [ 4  1  9]
  [ 4  2  9]
  [ 4  3 -1]
  [ 4  4 -1]
  [ 4  5 -1]
  [ 4  6  2]]

 [[ 5  0  2]
  [ 5  1  9]
  [ 5  2  9]
  [ 5  3 -1]
  [ 5  4 -1]
  [ 5  5 -1]
  [ 5  6  2]]

 [[ 6  0  2]
  [ 6  1 -1]
  [ 6  2 -1]
  [ 6  3 -1]
  [ 6  4 -1]
  [ 6  5 -1]
  [ 6  6  2]]

 [[ 7  0  2]
  [ 7  1  8]
  [ 7  2 -1]
  [ 7  3 -1]
  [ 7  4 -1]
  [ 7  5 -1]
  [ 7  6  2]]

 [[ 8  0  2]
  [ 8  1  2]
  [ 8  2  2]
  [ 8  3  2]
  [ 8  4  2]
  [ 8  5  2]
  [ 8  6  2]]]

which is the same as the environment:

Figure_1