[Bug] Tensorboard logging not logging every log_interval timesteps

YannBerthelot commented 2 years ago

🐛 Bug

The documentation of DQN agent (https://stable-baselines3.readthedocs.io/en/master/modules/dqn.html) specifies that log_interval parameter is "The number of timesteps before logging". However, when set to 1 (or any other value) the logging is not made at that pace but is instead made every log_interval episode (and not timesteps). In the example below this is made every 200 timesteps.

To Reproduce

from stable_baselines3 import DQN

env = gym.make("MountainCar-v0")
model = DQN("MlpPolicy", env,tensorboard_log="logs")
model.learn(total_timesteps=2000, log_interval=1)

Expected behavior

I would have expected to see the logging every timestep and not every episode. Either the behavior should be switched (but logging every n episode is useful too ...) or the doc should be updated.

System Info

OS: Linux-5.4.144+-x86_64-with-Ubuntu-18.04-bionic #1 SMP Tue Dec 7 09:58:10 PST 2021 Python: 3.7.12 Stable-Baselines3: 1.3.0 PyTorch: 1.10.0+cu111 GPU Enabled: False Numpy: 1.19.5 Gym: 0.17.3

Checklist

[x] I have checked that there is no similar issue in the repo (required)
[x] I have read the documentation (required)
[x] I have provided a minimal working example to reproduce the bug (required)

Miffyli commented 2 years ago

Thanks for reporting this! As per Discord chats, we should update the documentation to indeed reflect this behaviour (potentially same for other off-policy algos as well). I can try to fix this quickly, but if you want to get some github experience we would love to accept a PR from you that fixes this :)

m1kol commented 2 years ago

Yeah, the same experience with the DDPG algorithm on the Cartpole environment. And not only with TensorBoard, but logging in general. I used HumanOutputFormat and MLflowCustomFormat, provided in the documentation.