RLeXplore provides stable baselines of exploration methods in reinforcement learning, such as intrinsic curiosity module (ICM), random network distillation (RND) and rewarding impact-driven exploration (RIDE).
I am running the following code to evaluate the model I obtained
import torch as th
import os
from rllte.env import make_mario_env
from rllte.agent import PPO, DDPG
import rllte
if __name__ == '__main__':
n_steps: int = 2048 * 16
device = 'cuda' if th.cuda.is_available() else 'cpu'
envs = make_mario_env('SuperMarioBros-1-1-v0', device=device, num_envs=1,
asynchronous=False, frame_stack=4, gray_scale=True)
print(device, envs.observation_space, envs.action_space)
agent = PPO(envs,
device=device,
batch_size=512,
n_epochs=10,
num_steps=n_steps//8,
pretraining=False)
agent.freeze(init_model_path="pretrained_1507328.pth")
agent.eval_env = envs
agent.eval(3)
But checking the x_pos of Mario at the end of each episode I noticed that for all the three evaluation the algorithm is behaving deterministically, returning the same result. Is there a way to avoid this?
I am running the following code to evaluate the model I obtained
But checking the x_pos of Mario at the end of each episode I noticed that for all the three evaluation the algorithm is behaving deterministically, returning the same result. Is there a way to avoid this?