Open Acedorkz opened 1 month ago
Hi. The versions are as follows:
For using unregistered envs like MiniGrid-MultiRoom-N12-S10, u need to add a code like:
import minigrid
from gymnasium.envs.registration import register
register(
id="MiniGrid-MultiRoom-N10-S10-v0",
entry_point="minigrid.envs:MultiRoomEnv",
kwargs={"minNumRooms": 10, "maxNumRooms": 10, "maxRoomSize": 10},
)
register(
id="MiniGrid-MultiRoom-N12-S10-v0",
entry_point="minigrid.envs:MultiRoomEnv",
kwargs={"minNumRooms": 12, "maxNumRooms": 12, "maxRoomSize": 10},
)
Thanks for the versions which help a lot. Sorry, one more question about the encoder_model used in MultiRoom-N10-S10-v0. Espeholt or Mnih can not work. Could you give some guidance about how to run on minigrid env to replicate the results on https://wandb.ai/yuanmingqi/RLeXplore/reportlist? Thanks again. Have a nice day.
Hi, the code for the MiniGrid task is an experiment version that hasn't been uploaded yet. You can change the current code by
change the env code by:
def make_minigrid_env(
env_id: str = "MiniGrid-DoorKey-5x5-v0",
num_envs: int = 8,
fully_observable: bool = False,
fully_numerical: bool = False,
seed: int = 0,
frame_stack: int = 4,
device: str = "cpu",
asynchronous: bool = False,
) -> Gymnasium2Torch:
"""Create MiniGrid environments.
Args:
env_id (str): Name of environment.
num_envs (int): Number of environments.
fully_observable (bool): Fully observable gridworld using a compact grid encoding instead of the agent view.
fully_numerical (bool): Transforms the observation space (that has a textual component) to a fully numerical
observation space, where the textual instructions are replaced by arrays representing the indices of each
word in a fixed vocabulary.
seed (int): Random seed.
frame_stack (int): Number of stacked frames.
device (str): Device to convert the data.
asynchronous (bool): `True` for creating asynchronous environments,
and `False` for creating synchronous environments.
Returns:
The vectorized environments.
"""
def make_env(env_id: str, seed: int) -> Callable:
def _thunk():
env = gym.make(env_id)
#env = RGBImgPartialObsWrapper(env)
env = ImageTranspose(env)
env = ImgObsWrapper(env)
#env = ResizeObservation(env, 84)
#env = FrameStack(env, k=frame_stack)
env.action_space.seed(seed)
env.observation_space.seed(seed)
return env
return _thunk
envs = [make_env(env_id, seed + i) for i in range(num_envs)]
if asynchronous:
envs = AsyncVectorEnv(envs)
else:
envs = SyncVectorEnv(envs)
envs = TransformReward(envs, lambda r: 100.0 * r)
envs = RecordEpisodeStatistics(envs)
return Gymnasium2Torch(envs, device=device)
use a new MiniGrid
encoder
import torch
from torch import nn
from rllte.common.prototype import BaseEncoder
class MinigridEncoder(BaseEncoder): def init(self, observation_space, features_dim: int = 512) -> None: super().init(observation_space, features_dim) n_input_channels = observation_space.shape[0] self.cnn = nn.Sequential( nn.Conv2d(n_input_channels, 16, (2, 2)), nn.ReLU(), nn.Conv2d(16, 32, (2, 2)), nn.ReLU(), nn.Conv2d(32, 64, (2, 2)), nn.ReLU(), nn.Flatten(), )
# Compute shape by doing one forward pass
with torch.no_grad():
observations = observation_space.sample()
observations = torch.as_tensor(observations[None]).float()
n_flatten = self.cnn(observations).float().shape[1]
self.linear = nn.Sequential(nn.Linear(n_flatten, features_dim), nn.ReLU())
def forward(self, observations: torch.Tensor) -> torch.Tensor:
#observations = observations.permute(0, 3, 1, 2).float()
return self.linear(self.cnn(observations.float()))
change the old encoder, like in E3B
``` py
# build the encoder and inverse dynamics model
self.encoder = MinigridEncoder(observation_space=observation_space).to(self.device)
intrinsic_reward = E3B(
observation_space=env.observation_space,
action_space=env.action_space,
device=device,
n_envs=args.n_envs,
rwd_norm_type="rms",
obs_rms=True,
update_proportion=1.0,
gamma=args.int_gamma,
encoder_model=encoder_model,
weight_init='orthogonal',
beta=0.25,
latent_dim=args.hidden_dim
)
I will update the code asap, you can have a try first. Thx!
Hi, thanks for your kind replies. I have tried the above code, Unfortunately, I still can not replicate the results on MiniGrid-MultiRoom-N12-S10-v0. The reward is always zero. Perhaps some hyperparameters are not set appropriately. Could u please provide them? Really thanks. Also looking forward to the update version.
Could u provide an email? I can share the experiment code with you first.
Thanks a lot. annezhu1212@outlook.com
Sent via email.
Hi, I have tried the code of RIDE on MultiRoom-N10S10-v0 which you emailed. The reward converges to zero which is inconsistent with the RIDE original paper. I am not sure if I have tried it correctly.
Moreover, the pseudo_counts involves the k-nearest neighbors of f(x_t) in the memory. The original implementation of pseudo_counts in RIDE is the sqrt of N(ep_s) which indicates the number of times that state has been visited during the current episode. I was wondering if you have ever tried the N(ep_s) way.
Thanks.
Hello, I am trying to run on env minigrid. Could you please help with the version of minigrid, gymnasium, gym? I want to test on envs MiniGrid-ObstructedMaze-Full, and MiniGrid-MultiRoom-N12-S10 etc. Really thanks for your help and look forwarding to your replay.