RLE-Foundation / RLeXplore

RLeXplore provides stable baselines of exploration methods in reinforcement learning, such as intrinsic curiosity module (ICM), random network distillation (RND) and rewarding impact-driven exploration (RIDE).
https://docs.rllte.dev/
MIT License
367 stars 15 forks source link

Minigrid version #24

Open Acedorkz opened 1 month ago

Acedorkz commented 1 month ago

Hello, I am trying to run on env minigrid. Could you please help with the version of minigrid, gymnasium, gym? I want to test on envs MiniGrid-ObstructedMaze-Full, and MiniGrid-MultiRoom-N12-S10 etc. Really thanks for your help and look forwarding to your replay.

yuanmingqi commented 1 month ago

Hi. The versions are as follows:

For using unregistered envs like MiniGrid-MultiRoom-N12-S10, u need to add a code like:

import minigrid
from gymnasium.envs.registration import register

register(
        id="MiniGrid-MultiRoom-N10-S10-v0",
        entry_point="minigrid.envs:MultiRoomEnv",
        kwargs={"minNumRooms": 10, "maxNumRooms": 10, "maxRoomSize": 10},
    )

register(
        id="MiniGrid-MultiRoom-N12-S10-v0",
        entry_point="minigrid.envs:MultiRoomEnv",
        kwargs={"minNumRooms": 12, "maxNumRooms": 12, "maxRoomSize": 10},
    )
Acedorkz commented 1 month ago

Thanks for the versions which help a lot. Sorry, one more question about the encoder_model used in MultiRoom-N10-S10-v0. Espeholt or Mnih can not work. Could you give some guidance about how to run on minigrid env to replicate the results on https://wandb.ai/yuanmingqi/RLeXplore/reportlist? Thanks again. Have a nice day.

yuanmingqi commented 1 month ago

Hi, the code for the MiniGrid task is an experiment version that hasn't been uploaded yet. You can change the current code by

class MinigridEncoder(BaseEncoder): def init(self, observation_space, features_dim: int = 512) -> None: super().init(observation_space, features_dim) n_input_channels = observation_space.shape[0] self.cnn = nn.Sequential( nn.Conv2d(n_input_channels, 16, (2, 2)), nn.ReLU(), nn.Conv2d(16, 32, (2, 2)), nn.ReLU(), nn.Conv2d(32, 64, (2, 2)), nn.ReLU(), nn.Flatten(), )

    # Compute shape by doing one forward pass
    with torch.no_grad():
        observations = observation_space.sample()
        observations = torch.as_tensor(observations[None]).float()
        n_flatten = self.cnn(observations).float().shape[1]

    self.linear = nn.Sequential(nn.Linear(n_flatten, features_dim), nn.ReLU())

def forward(self, observations: torch.Tensor) -> torch.Tensor:
    #observations = observations.permute(0, 3, 1, 2).float()
    return self.linear(self.cnn(observations.float()))
change the old encoder, like in E3B
``` py
# build the encoder and inverse dynamics model
self.encoder = MinigridEncoder(observation_space=observation_space).to(self.device)
Acedorkz commented 3 weeks ago

Hi, thanks for your kind replies. I have tried the above code, Unfortunately, I still can not replicate the results on MiniGrid-MultiRoom-N12-S10-v0. The reward is always zero. Perhaps some hyperparameters are not set appropriately. Could u please provide them? Really thanks. Also looking forward to the update version.

yuanmingqi commented 3 weeks ago

Could u provide an email? I can share the experiment code with you first.

Acedorkz commented 3 weeks ago

Thanks a lot. annezhu1212@outlook.com

yuanmingqi commented 3 weeks ago

Sent via email.

Acedorkz commented 3 weeks ago

Hi, I have tried the code of RIDE on MultiRoom-N10S10-v0 which you emailed. The reward converges to zero which is inconsistent with the RIDE original paper. I am not sure if I have tried it correctly.

Moreover, the pseudo_counts involves the k-nearest neighbors of f(x_t) in the memory. The original implementation of pseudo_counts in RIDE is the sqrt of N(ep_s) which indicates the number of times that state has been visited during the current episode. I was wondering if you have ever tried the N(ep_s) way.

Thanks.