DLR-RM / stable-baselines3

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.
https://stable-baselines3.readthedocs.io
MIT License
8.84k stars 1.68k forks source link

[Question] The error about DQN--ep_len_mean&ep_rew_mean output #1918

Closed AnnyOrange closed 4 months ago

AnnyOrange commented 5 months ago

❓ Question

q: I found that by running dqn, the output of ep_len_mean&ep_rew_mean are the same. Why this happens? How can I solve this?

By running the example code:

import gymnasium as gym
from stable_baselines3 import DQN
env = gym.make("CartPole-v1", render_mode="human")
model = DQN("MlpPolicy", env, verbose=1)
model.learn(total_timesteps=10000, log_interval=4)
model.save("dqn_cartpole")
del model # remove to demonstrate saving and loading
model = DQN.load("dqn_cartpole")
obs, info = env.reset()
while True:
    action, _states = model.predict(obs, deterministic=True)
    obs, reward, terminated, truncated, info = env.step(action)
    if terminated or truncated:
        obs, info = env.reset()

From my perspective, ep_len_mean is Episode Length Mean, ep_rew_mean output denotes Episode Reward Mean. These two should not be the same. However, from the output of terminal:

1714814999560

and the output from tensorboard:

1714815031615

these two are the same.

Checklist

qgallouedec commented 5 months ago

CartPole is particular: the agent gets +1.0 reward per timestep. That’s why they are equal. But in general they aren’t.