DLR-RM / stable-baselines3

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.
https://stable-baselines3.readthedocs.io
MIT License
8.35k stars 1.6k forks source link

Handing mission space in Babyai env #1914

Closed Chainesh closed 2 months ago

Chainesh commented 2 months ago

🐛 Bug

Hii,

I'm trying to train BabyAI environment for specific missions during training. The observation space of BabyAI looks like this Dict('direction': Discrete(4), 'image': Box(0, 255, (7, 7, 3), uint8), 'mission': MissionSpace(<function BabyAIMissionSpace._gen_mission at 0x7f813d9da940>, None)) and using RGBImgPartialObsWrapper I've converted the observation Space to Dict('direction': Discrete(4), 'image': Box(0, 255, (56, 56, 3), uint8), 'mission': MissionSpace(<function BabyAIMissionSpace._gen_mission at 0x7aa32616bc40>, None)).

Whether I pass DummyVecEnv or not it throws me the same error TypeError: 'NoneType' object is not iterable due to the fact that mission key is None. How can I train this model considering I want to use MultiInputPolicy after converting the mission into vector embedding and concatenating it with image embedding.

Any help on this would be much appreciated. Let me know if something else is needed to solve this. Thanks :)

Code example

import gymnasium as gym 
from minigrid.wrappers import RGBImgObsWrapper, RGBImgPartialObsWrapper
from stable_baselines3.common.vec_env import VecTransposeImage, DummyVecEnv
from minigrid.wrappers import RGBImgPartialObsWrapper
from stable_baselines3.common.monitor import Monitor
from stable_baselines3 import PPO

class CustomEnv(gym.Env):
    def __init__(self, env, mission):
        self.env = env
        self.observation_space = env.observation_space
        self.action_space = env.action_space
        self.mission = mission

    def reset(self, **kwargs):
        obs = self.env.reset(**kwargs)
        while self.env.mission != self.mission:
            obs = self.env.reset(**kwargs)
        return obs

    def step(self, action):
        return self.env.step(action)

    def render(self, *args, **kwargs):
        return self.env.render(*args, **kwargs)

missions  = ["pick up the red ball"]
for mission in missions:
    env  = gym.make("BabyAI-PickupLoc-v0")#, render_mode = "human" )
    env = RGBImgPartialObsWrapper(env)
    env = CustomEnv(env, mission)
    print(env.observation_space)
    env = Monitor(env)
    env = DummyVecEnv([lambda: env])
    env = VecTransposeImage(env)
    print(env.observation_space)
    model =  PPO("MlpPolicy", env, seed = 42, verbose=1)
    model.learn(50000)

Relevant log output / Error message

Dict('direction': Discrete(4), 'image': Box(0, 255, (56, 56, 3), uint8), 'mission': MissionSpace(<function BabyAIMissionSpace._gen_mission at 0x7aa32616bc40>, None))
Traceback (most recent call last):
  File "/home/scl/ranfom.py", line 43, in <module>
    env = DummyVecEnv([lambda: env])
          ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/scl/anaconda3/lib/python3.11/site-packages/stable_baselines3/common/vec_env/dummy_vec_env.py", line 46, in __init__
    self.buf_obs = OrderedDict([(k, np.zeros((self.num_envs, *tuple(shapes[k])), dtype=dtypes[k])) for k in self.keys])
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/scl/anaconda3/lib/python3.11/site-packages/stable_baselines3/common/vec_env/dummy_vec_env.py", line 46, in <listcomp>
    self.buf_obs = OrderedDict([(k, np.zeros((self.num_envs, *tuple(shapes[k])), dtype=dtypes[k])) for k in self.keys])
                                                              ^^^^^^^^^^^^^^^^
TypeError: 'NoneType' object is not iterable

System Info

No response

Checklist

qgallouedec commented 2 months ago

Hey, please provide the system info As far as I remember, the babyai mission is a str right?

Chainesh commented 2 months ago

Here is the system info as requested:

Yes you're correct it should be a string based on the babyai level, but image and direction key are not causing the issue for sure. Even on simply Using this code

env  = gym.make("BabyAI-PickupLoc-v0")
model = PPO("MlpPolicy", env, seed = 42, verbose=1)
model.learn(50000)
env.close()

throws the same error.

qgallouedec commented 2 months ago

Text observation is not supported. Technically, to make it work, you can convert the text to a discrete value, for example. That said, depending on what you're trying to do and show, this solution may not be relevant.

Chainesh commented 2 months ago

Can I convert text into a vector embedding and them might be able to use it?

qgallouedec commented 2 months ago

it should work yes

Chainesh commented 2 months ago

Thanks :)