Closed danielstankw closed 3 years ago
Hello, please fill the custom env template completely.
@danielstankw, use the command for the tensorboard
like this: (ref)
tensorboard --logdir ./a2c_cartpole_tensorboard/
@zeeshanalipanhwar thanks, i already figured out issue i had :)
What was the issue and how did you resolve it?
Having a similar problem. When using custom env, no logging occurs. Could there be something missing in. gym.Env class that causes this behaviour?
current_time = datetime.datetime.now().strftime("%Y-%m-%d_%H-%M-%S")
model = SAC('MlpPolicy',env,verbose=2, tensorboard_log="./runs/",policy_kwargs=policy_kwargs)
model.learn(total_timesteps=1000, tb_log_name=current_time,log_interval=1)
This creates right folder and events.out.tfevents
file. When using custom env size of file stays at 40 bytes. When using Pendulum-0 env file size grows as expected.
Checked custom env with:
from stable_baselines3.common.env_checker import check_env
print(check_env(env))
python=3.7, torch=1.10.2, MacBook Air M1
@SamsTheGreatest It is hard to debug without having access to the full code, but the custom env should not affect this (all the logging is done in SB3 code, not in the environment). One possible thing that comes to mind is that the episodes are longer in custom env and that is slowing down logging (and with 1000 steps there are no completed episodes), but beyond that I do not have tips to give :/
If you spot a bug you can replicate, please open up a new issue with code to replicate the issue.
@Miffyli makes, sense. tested its exactly the issue. Does one only write after an episode and we cannot set number of steps when writing can occur?
The exact time of logging depends on the algorithm and its settings. E.g. PPO logs most of the stuff after every rollout (after n_steps * n_env
steps). For SAC and other off-policy algorithms it seems to check if after every episode (this is because you can not log episode rewards until you complete an episode).
@Miffyli I had the same exact issue with PPO on a custom gym environment. In my case, the computation time per step is fairly high, which is why I set log_interval
to one and expected logging after each step (or at least episode).
I think the parameter's naming is a bit confusing and may lead to unexpected behaviour, especially when working with low frame per second environments.
@stwerner97 I understand the issue, but on the other side there is not much to log after few steps in the environment: only a poor sample of the FPS of the environment, time ran and number of steps done. The episode stats only appear once episodes are completed (i.e. not updated every step), and there are no training metrics until first training rollout is done after n_steps
. I recommend looking at, say, SAC or TD3 if you have a very low FPS environment (they are more sample efficient).
@Miffyli I completely agree, although the documentation could point out at what times logs are written.
Yes, I am using PPO as a baseline to judge the severity of sample inefficiency. Thanks for the response!
I completely agree, although the documentation could point out at what times logs are written.
This sounds like a nice addition to the docs, and would be happy to review a PR if you feel like contributing a little :)
Sure, I will open a PR later today :)
Important Note: We do not do technical support, nor consulting and don't answer personal questions per email. Please post your question on the RL Discord, Reddit or Stack Overflow in that case.
Question
I am running my learning algorithm on a custom environment created using Robosuite. I followed the documentation and try to use the tensorboard but the file used by tensorboard doesnt get created.
` model = PPO('MlpPolicy', env, verbose=1, n_steps=4, batch_size=4, tensorboard_log="./ppo_tensorboard/") model.learn(total_timesteps=8, tb_log_name='learning')
`
I run it just for few steps to see if file used by tensorboard gets created but it doesnt. Any suggestions?
Checklist