IntelLabs / coach

Reinforcement Learning Coach by Intel AI Lab enables easy experimentation with state of the art Reinforcement Learning algorithms
https://intellabs.github.io/coach/
Apache License 2.0
2.33k stars 461 forks source link

Can't figure out 'observation_space' configuration #427

Open mludvig opened 4 years ago

mludvig commented 4 years ago

Hi I'm building a new gym that needs a simple 'board' to record the game status, eg. 10x10 cells, and the agent position. Unfortunately I'm unable to figure out how to set up the observation_space structure.

This is what I tried:

    self._board_size = (10, 10)
    self._board = np.ones(self._board_size, dtype=np.int32)
    self._position = np.array([
        np.random.randint(self._board_size[0]),
        np.random.randint(self._board_size[1]),
    ])

And then the observation_space:

    self.observation_space = spaces.Dict({
        "board_status": spaces.Box(
            low=np.zeros(len(self._board.flatten()), dtype=np.int32),
            high=np.ones(len(self._board.flatten()), dtype=np.int32),
            dtype=np.int32),
        "position": spaces.Box(
            low=np.array((0,0)),
            high=np.array(self._board_size),
            dtype=np.int32),
    })

Now in the step() function I return a dictionary:

def _get_observation(self):
    return {
        "board_status": self._board.flatten(),
        "position": self._position,
    }

However when I run it with coach it fails:

ValueError: The key for the input embedder (observation) must match 
    one of the following keys: dict_keys(['board_status', 'position',
    'measurements', 'action', 'goal'])

My presets file has this:

env_params = GymVectorEnvironment(level='Test-v0')

How can I return the board status and agent position to the agent?


I have also tried to return use non-flattened board but that failed even sooner during initialisation:

    self.observation_space = spaces.Dict({
        "board_status": spaces.Box(
            low=np.zeros(self._board_size, dtype=np.int64),
            high=np.ones(self._board_size, dtype=np.int64),
            dtype=np.int32),
        "position": spaces.Box(
            low=np.array((0,0)),
            high=np.array(self._board_size),
            dtype=np.int32),
    })

And I got:

Failed to instantiate Gym environment class Test-v0 with observation space type None

Can you provide any advice how to do that please?


I'm using:

numpy version: 1.17.4 gym version: 0.15.4 rl-coach version: 1.0.1

AustinDeric commented 4 years ago

your observation_space_type is set to None and you are getting caught here: https://github.com/NervanaSystems/coach/blob/0867d8d0fbc32057baa06fa65a0b68b271710120/rl_coach/environments/gym_environment.py#L359

your observation is not getting assigned to one of the ObservationSpaceTypes: https://github.com/NervanaSystems/coach/blob/0867d8d0fbc32057baa06fa65a0b68b271710120/rl_coach/environments/gym_environment.py#L181

try printing out your len(observation_space.shape) to determine which of your observations is causing this, but eyeballing your code it seems like the shape of the "position" observation is 2, which is not supported.

Suggestion: there are a few ways to do this for your case. But you don't need a float32 for position. it looks like you are doing a board game, so try a discrete position for the "position":

    self.observation_space = spaces.Dict({
        "board_status": spaces.Box(
            low=np.zeros(len(self._board.flatten()), dtype=np.int32),
            high=np.ones(len(self._board.flatten()), dtype=np.int32),
            dtype=np.int32),
        "position": spaces.Discrete(self._board_size)})

I don't think this is an issue in coach. maybe some features around testing the environment inputs can make debugging easier?


also your version of gym should be 0.12.5 per the rl_coach requirements.

amcb14 commented 4 years ago

Hi,

I am new to coach and I am trying it in my custom gym environment after playing a little bit with some available environments.

I had exactly the same problem mentioned by @mludvig implementing a dictionary of spaces.Box with shape 1:

    self.observation_space = spaces.Dict({"R1": spaces.Box(low=0, high=3, shape=(1, ), dtype=np.int64),
                                          "R2": spaces.Box(low=0, high=3, shape=(1, ), dtype=np.int64),
                                          "R3": spaces.Box(low=0, high=3, shape=(1, ), dtype=np.int64),
                                          "R4": spaces.Box(low=0, high=3, shape=(1, ), dtype=np.int64),
                                          "R5": spaces.Box(low=0, high=3, shape=(1, ), dtype=np.int64),
                                          "R6": spaces.Box(low=0, high=3, shape=(1, ), dtype=np.int64),
                                          "R7": spaces.Box(low=0, high=3, shape=(1, ), dtype=np.int64),
                                          "R8": spaces.Box(low=0, high=3, shape=(1, ), dtype=np.int64),
                                          "Direction": spaces.Box(low=0, high=1, shape=(1, ), dtype=np.int64),
                                          })

ValueError: The key for the input embedder (observation) must match one of the following keys: dict_keys([ (...)

Actually, my first attempts were through a dictionary of discrete spaces (since coach seems to not handle gym MultiDiscrete spaces):

    self.observation_space = spaces.Dict({"R1": spaces.Discrete(4),
                                          "R2": spaces.Discrete(4),
                                          "R3": spaces.Discrete(4),
                                          "R4": spaces.Discrete(4),
                                          "R5": spaces.Discrete(4),
                                          "R6": spaces.Discrete(4),
                                          "R7": spaces.Discrete(4),
                                          "R8": spaces.Discrete(4),
                                          "Direction": spaces.Discrete(2)})

But I got the error regarding the observation_space_type not been recognized:

Failed to instantiate Gym environment class <class 'rl_coach.environments.ewocc.ewoccEnv.ewoccEnv'> with observation space type None

Is there a better option to declare an observation_space that is a discrete array for a gym environment in coach?

Thank you!

amcb14 commented 4 years ago

Update

After setting the observation space type in the preset I could run it, avoiding the mentioned error ( ValueError: The key for the input embedder (observation) must match one of the following keys: dict_keys([ (...) )

env_params.observation_space_type = 2

However, after some more attempts, I still can not run it using a dictionary of discrete spaces...