Minigrid version - Githubissues

Acedorkz commented 6 days ago

Hello, I am trying to run on env minigrid. Could you please help with the version of minigrid, gymnasium, gym? I want to test on envs MiniGrid-ObstructedMaze-Full, and MiniGrid-MultiRoom-N12-S10 etc. Really thanks for your help and look forwarding to your replay.

yuanmingqi commented 6 days ago

Hi. The versions are as follows:

gymnasium 0.28.1
gym 0.26.1
minigrid 2.3.1

For using unregistered envs like MiniGrid-MultiRoom-N12-S10, u need to add a code like:

import minigrid
from gymnasium.envs.registration import register

register(
        id="MiniGrid-MultiRoom-N10-S10-v0",
        entry_point="minigrid.envs:MultiRoomEnv",
        kwargs={"minNumRooms": 10, "maxNumRooms": 10, "maxRoomSize": 10},
    )

register(
        id="MiniGrid-MultiRoom-N12-S10-v0",
        entry_point="minigrid.envs:MultiRoomEnv",
        kwargs={"minNumRooms": 12, "maxNumRooms": 12, "maxRoomSize": 10},
    )

Acedorkz commented 6 days ago

Thanks for the versions which help a lot. Sorry, one more question about the encoder_model used in MultiRoom-N10-S10-v0. Espeholt or Mnih can not work. Could you give some guidance about how to run on minigrid env to replicate the results on https://wandb.ai/yuanmingqi/RLeXplore/reportlist? Thanks again. Have a nice day.

yuanmingqi commented 5 days ago

Hi, the code for the MiniGrid task is an experiment version that hasn't been uploaded yet. You can change the current code by

change the env code by:

def make_minigrid_env(
env_id: str = "MiniGrid-DoorKey-5x5-v0",
num_envs: int = 8,
fully_observable: bool = False,
fully_numerical: bool = False,
seed: int = 0,
frame_stack: int = 4,
device: str = "cpu",
asynchronous: bool = False,
) -> Gymnasium2Torch:
"""Create MiniGrid environments.

Args:
    env_id (str): Name of environment.
    num_envs (int): Number of environments.
    fully_observable (bool): Fully observable gridworld using a compact grid encoding instead of the agent view.
    fully_numerical (bool): Transforms the observation space (that has a textual component) to a fully numerical
        observation space, where the textual instructions are replaced by arrays representing the indices of each
        word in a fixed vocabulary.
    seed (int): Random seed.
    frame_stack (int): Number of stacked frames.
    device (str): Device to convert the data.
    asynchronous (bool): `True` for creating asynchronous environments,
        and `False` for creating synchronous environments.

Returns:
    The vectorized environments.
"""

def make_env(env_id: str, seed: int) -> Callable:
    def _thunk():
        env = gym.make(env_id)

        #env = RGBImgPartialObsWrapper(env)
        env = ImageTranspose(env)
        env = ImgObsWrapper(env)
        #env = ResizeObservation(env, 84)
        #env = FrameStack(env, k=frame_stack)

        env.action_space.seed(seed)
        env.observation_space.seed(seed)

        return env

    return _thunk

envs = [make_env(env_id, seed + i) for i in range(num_envs)]

if asynchronous:
    envs = AsyncVectorEnv(envs)
else:
    envs = SyncVectorEnv(envs)

envs = TransformReward(envs, lambda r: 100.0 * r)
envs = RecordEpisodeStatistics(envs)

return Gymnasium2Torch(envs, device=device)

use a new MiniGrid encoder


import torch
from torch import nn
from rllte.common.prototype import BaseEncoder

class MinigridEncoder(BaseEncoder): def init(self, observation_space, features_dim: int = 512) -> None: super().init(observation_space, features_dim) n_input_channels = observation_space.shape[0] self.cnn = nn.Sequential( nn.Conv2d(n_input_channels, 16, (2, 2)), nn.ReLU(), nn.Conv2d(16, 32, (2, 2)), nn.ReLU(), nn.Conv2d(32, 64, (2, 2)), nn.ReLU(), nn.Flatten(), )

    # Compute shape by doing one forward pass
    with torch.no_grad():
        observations = observation_space.sample()
        observations = torch.as_tensor(observations[None]).float()
        n_flatten = self.cnn(observations).float().shape[1]

    self.linear = nn.Sequential(nn.Linear(n_flatten, features_dim), nn.ReLU())

def forward(self, observations: torch.Tensor) -> torch.Tensor:
    #observations = observations.permute(0, 3, 1, 2).float()
    return self.linear(self.cnn(observations.float()))

change the old encoder, like in E3B
``` py
# build the encoder and inverse dynamics model
self.encoder = MinigridEncoder(observation_space=observation_space).to(self.device)

use the following hyperparameters:

intrinsic_reward = E3B(
            observation_space=env.observation_space,
            action_space=env.action_space,
            device=device,
            n_envs=args.n_envs,
            rwd_norm_type="rms",
            obs_rms=True,
            update_proportion=1.0,
            gamma=args.int_gamma,
            encoder_model=encoder_model,
            weight_init='orthogonal',
            beta=0.25,
            latent_dim=args.hidden_dim
        )

I will update the code asap, you can have a try first. Thx!

Acedorkz commented 2 days ago

Hi, thanks for your kind replies. I have tried the above code, Unfortunately, I still can not replicate the results on MiniGrid-MultiRoom-N12-S10-v0. The reward is always zero. Perhaps some hyperparameters are not set appropriately. Could u please provide them? Really thanks. Also looking forward to the update version.

yuanmingqi commented 2 days ago

Could u provide an email? I can share the experiment code with you first.

Acedorkz commented 2 days ago

Thanks a lot. annezhu1212@outlook.com

yuanmingqi commented 22 hours ago

Sent via email.

RLE-Foundation / RLeXplore

Minigrid version #24