The input shape of MLP for discrete observation space

Denys88 / rl_games

RL implementations

MIT License

848 stars 142 forks source link

The input shape of MLP for discrete observation space #160

Closed yuemingl closed 2 years ago

yuemingl commented 2 years ago

I am trying to use the BlackjackEnv (https://github.com/openai/gym/blob/master/gym/envs/toy_text/blackjack.py) in rl_games. It seems rl_games doesn't support the discrete observation space like: spaces.Tuple((spaces.Discrete(32), spaces.Discrete(11), spaces.Discrete(2))) Any plan to support this feature?

Denys88 commented 2 years ago

hi @yuemingl it should work with gym.spaces.Tuple([gym.spaces.Discrete(2),gym.spaces.Discrete(3)]) should work. Could you check your yaml config. I have separate model for it:
model: name: multi_discrete_a2c I should merge it with just discrete_a2c but didn't do it yet.

yuemingl commented 2 years ago

My problem is that I got error from function _calc_input_size() in network_builder.py. The cause of the error is the shape of the discrete observation space is '()'. I found an issue here https://github.com/openai/gym/issues/791 that talking about the discrete observation/action spaces. Returning '()' for discrete is changed as expected. Do you use one-hot for discrete observation/action spaces? For example, do you expect a shape like '(5,)' instead of '()' for spaces.Discrete(5)?

Denys88 commented 2 years ago

Ah, I missed that. Yes It might not be supported. There are two things you can do right now: I support multiple obs as the Dict: spaces = { 'pos': gym.spaces.Box(low=0, high=1, shape=(2, ), dtype=np.float32), 'info': gym.spaces.Box(low=0, high=1, shape=(4, ), dtype=np.float32), } self.observation_space = gym.spaces.Dict(spaces) But you will require to create a custom neural network. here is simple test network example: https://github.com/Denys88/rl_games/blob/master/rl_games/envs/test_network.py

I think you best solution is to create a wrapper which will just merge all observations into the one if possible and try it first with the default solution,

Denys88 commented 2 years ago

@yuemingl I hope you made it work. closing it for now :)