RLE-Foundation / RLeXplore

RLeXplore provides stable baselines of exploration methods in reinforcement learning, such as intrinsic curiosity module (ICM), random network distillation (RND) and rewarding impact-driven exploration (RIDE).
https://docs.rllte.dev/
MIT License
367 stars 15 forks source link

Deterministic behavior in evaluation #18

Open edofazza opened 5 months ago

edofazza commented 5 months ago

I am running the following code to evaluate the model I obtained

import torch as th
import os
from rllte.env import make_mario_env
from rllte.agent import PPO, DDPG
import rllte

if __name__ == '__main__':
    n_steps: int = 2048 * 16
    device = 'cuda' if th.cuda.is_available() else 'cpu'
    envs = make_mario_env('SuperMarioBros-1-1-v0', device=device, num_envs=1,
                          asynchronous=False, frame_stack=4, gray_scale=True)
    print(device, envs.observation_space, envs.action_space)

    agent = PPO(envs,
                device=device,
                batch_size=512,
                n_epochs=10,
                num_steps=n_steps//8,
                pretraining=False)

    agent.freeze(init_model_path="pretrained_1507328.pth")
    agent.eval_env = envs
    agent.eval(3)

But checking the x_pos of Mario at the end of each episode I noticed that for all the three evaluation the algorithm is behaving deterministically, returning the same result. Is there a way to avoid this?

yuanmingqi commented 5 months ago

@roger-creus Roger, can you help with this issue?