DLR-RM / stable-baselines3

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.

https://stable-baselines3.readthedocs.io

MIT License

9.07k stars 1.7k forks source link

[Question] Tensorboard logging doesnt work #630

Closed danielstankw closed 3 years ago

danielstankw commented 3 years ago

Important Note: We do not do technical support, nor consulting and don't answer personal questions per email. Please post your question on the RL Discord, Reddit or Stack Overflow in that case.

Question

I am running my learning algorithm on a custom environment created using Robosuite. I followed the documentation and try to use the tensorboard but the file used by tensorboard doesnt get created.

` model = PPO('MlpPolicy', env, verbose=1, n_steps=4, batch_size=4, tensorboard_log="./ppo_tensorboard/") model.learn(total_timesteps=8, tb_log_name='learning')

`

I run it just for few steps to see if file used by tensorboard gets created but it doesnt. Any suggestions?

Checklist

[x] I have read the documentation (required)
[x] I have checked that there is no similar issue in the repo (required)

araffin commented 3 years ago

Hello, please fill the custom env template completely.

zeeshanalipanhwar commented 3 years ago

@danielstankw, use the command for the tensorboard like this: (ref) tensorboard --logdir ./a2c_cartpole_tensorboard/

danielstankw commented 3 years ago

@zeeshanalipanhwar thanks, i already figured out issue i had :)

zeeshanalipanhwar commented 3 years ago

What was the issue and how did you resolve it?

SamsTheGreatest commented 2 years ago

Having a similar problem. When using custom env, no logging occurs. Could there be something missing in. gym.Env class that causes this behaviour?

current_time = datetime.datetime.now().strftime("%Y-%m-%d_%H-%M-%S")
model = SAC('MlpPolicy',env,verbose=2, tensorboard_log="./runs/",policy_kwargs=policy_kwargs)
model.learn(total_timesteps=1000, tb_log_name=current_time,log_interval=1)

This creates right folder and events.out.tfevents file. When using custom env size of file stays at 40 bytes. When using Pendulum-0 env file size grows as expected.

Checked custom env with:

from stable_baselines3.common.env_checker import check_env
print(check_env(env))

python=3.7, torch=1.10.2, MacBook Air M1

Miffyli commented 2 years ago

@SamsTheGreatest It is hard to debug without having access to the full code, but the custom env should not affect this (all the logging is done in SB3 code, not in the environment). One possible thing that comes to mind is that the episodes are longer in custom env and that is slowing down logging (and with 1000 steps there are no completed episodes), but beyond that I do not have tips to give :/

If you spot a bug you can replicate, please open up a new issue with code to replicate the issue.

SamsTheGreatest commented 2 years ago

@Miffyli makes, sense. tested its exactly the issue. Does one only write after an episode and we cannot set number of steps when writing can occur?

Miffyli commented 2 years ago

The exact time of logging depends on the algorithm and its settings. E.g. PPO logs most of the stuff after every rollout (after n_steps * n_env steps). For SAC and other off-policy algorithms it seems to check if after every episode (this is because you can not log episode rewards until you complete an episode).

stwerner97 commented 2 years ago

@Miffyli I had the same exact issue with PPO on a custom gym environment. In my case, the computation time per step is fairly high, which is why I set log_interval to one and expected logging after each step (or at least episode).

I think the parameter's naming is a bit confusing and may lead to unexpected behaviour, especially when working with low frame per second environments.

Miffyli commented 2 years ago

@stwerner97 I understand the issue, but on the other side there is not much to log after few steps in the environment: only a poor sample of the FPS of the environment, time ran and number of steps done. The episode stats only appear once episodes are completed (i.e. not updated every step), and there are no training metrics until first training rollout is done after n_steps. I recommend looking at, say, SAC or TD3 if you have a very low FPS environment (they are more sample efficient).

stwerner97 commented 2 years ago

@Miffyli I completely agree, although the documentation could point out at what times logs are written.

Yes, I am using PPO as a baseline to judge the severity of sample inefficiency. Thanks for the response!

Miffyli commented 2 years ago

I completely agree, although the documentation could point out at what times logs are written.

This sounds like a nice addition to the docs, and would be happy to review a PR if you feel like contributing a little :)

stwerner97 commented 2 years ago

Sure, I will open a PR later today :)