q: I found that by running dqn, the output of ep_len_mean&ep_rew_mean are the same. Why this happens? How can I solve this?
By running the example code:
import gymnasium as gym
from stable_baselines3 import DQN
env = gym.make("CartPole-v1", render_mode="human")
model = DQN("MlpPolicy", env, verbose=1)
model.learn(total_timesteps=10000, log_interval=4)
model.save("dqn_cartpole")
del model # remove to demonstrate saving and loading
model = DQN.load("dqn_cartpole")
obs, info = env.reset()
while True:
action, _states = model.predict(obs, deterministic=True)
obs, reward, terminated, truncated, info = env.step(action)
if terminated or truncated:
obs, info = env.reset()
From my perspective, ep_len_mean is Episode Length Mean, ep_rew_mean output denotes Episode Reward Mean. These two should not be the same. However, from the output of terminal:
and the output from tensorboard:
these two are the same.
Checklist
[X] I have checked that there is no similar issue in the repo
❓ Question
q: I found that by running dqn, the output of ep_len_mean&ep_rew_mean are the same. Why this happens? How can I solve this?
By running the example code:
From my perspective, ep_len_mean is Episode Length Mean, ep_rew_mean output denotes Episode Reward Mean. These two should not be the same. However, from the output of terminal:
and the output from tensorboard:
these two are the same.
Checklist