DLR-RM / stable-baselines3

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.
https://stable-baselines3.readthedocs.io
MIT License
9.08k stars 1.7k forks source link

[Question] Dict obs with box and discrete spaces normalisation #629

Closed liamf555 closed 3 years ago

liamf555 commented 3 years ago

Important Note: We do not do technical support, nor consulting and don't answer personal questions per email. Please post your question on the RL Discord, Reddit or Stack Overflow in that case.

Question

Is it possible to mix box and discrete spaces in a dict observation, whilst also using the vecnormalise wrapper?

Additional context

I'm attempting to build a dict observation consisting of a box space and a discrete space for a custom env, specified as:

self.observation_space = gym.spaces.Dict(
            spaces={
                "vec": gym.spaces.Box(-np.inf, np.inf, (8), dtype=np.float32),
                "discrete": gym.spaces.Discrete(2),
            }
        )

I am using PPO, and so vectorise and vecnormalise the environment. The discrete observations will start out as 1, and will at some point become 0. This causes the normalised observations to become negative. In the example below, I am running PPO with 8 envs, so 7 of the discrete values are 1 and one env the discrete value is 0, causing the negative value when normalised:

obs: {'discrete': tensor([ 0.2085,  0.2085,  0.2085,  0.2085,  0.2085,  0.2085, -4.7953,  0.2085]), 'vec': tensor([[ 0.3396,  1.3658, -1.3030, -1.6530,  1.3601, -1.2858, -0.0848,  1.4730],
        [ 1.3653, -1.8538, -0.0659,  0.4101, -2.9554, -1.7149, -3.2497,  1.6209],
        [ 1.2471,  2.7916, -2.3026,  0.2600,  0.3495, -1.3930, -0.0848,  1.7688],
        [ 1.0757, -0.8906,  0.7379, -0.1837, -0.2545,  0.3068,  0.3674, -0.4499],
        [ 2.0280, -2.0443,  1.7680,  1.7685,  0.6806,  1.4615,  0.5934, -1.7811],
        [ 0.9729,  0.6722, -0.1616, -0.3783,  1.2651, -0.7620, -0.3108, -0.4499],
        [ 0.6956,  0.3284,  0.6919, -1.0233,  0.8267,  0.2034, -0.0848, -0.5978],
        [ 1.5304,  0.8715, -0.2174,  0.7762,  1.2413,  1.0145,  2.1759, -0.0062]])}

which causes an error in preprocess_obs , specifically the lines:

elif isinstance(observation_space, spaces.Discrete):
        # One hot encoding and convert to float to avoid errors
        return F.one_hot(obs.long(), num_classes=observation_space.n).float()

which gives a RuntimeError:

return F.one_hot(obs.long(), num_classes=observation_space.n).float()
RuntimeError: Class values must be non-negative.

Is what I am trying to do okay, how can I overcome this issue?

Checklist

Miffyli commented 3 years ago

Looking over the code for VecNormalize, it seems like that if obs space is Dict, it assumes all subspaces are Box and applies normalization. This should be updated only to apply normalization on Box spaces to make it work with your case. @araffin does this sound reasonable? At the very least it should throw an exception if a subspace is not a Box.

araffin commented 3 years ago

Hello, yes I'm aware of this issue. I think we either need to allow specification of excluded_keys or included_keys to the constructor. I would appreciate a PR that solves this issue ;)

kachayev commented 3 years ago

I recently faced exactly the same issue. I opened PR with the solution that I used, though I'm not sure if the new param name is clear enough from API perspective. Let me know WDYT, I will be more than glad to update.