Closed cisprague closed 4 years ago
Hello,
But, in Stable Baselines 3, nothing happens.
What do you mean?
A TF event log isn't created after starting learning.
Then please fill up the issue template completely.
Using latest (version in master) SB3 (0.8.0a4):
from stable_baselines3 import SAC
model = SAC('MlpPolicy', 'Pendulum-v0', tensorboard_log='/tmp/sb3/').learn(10000)
and then
tensorboard --logdir /tmp/sb3/
I can visualize all the graphs... and the tf event file is there.
Note: in SB3, the tf event is generated only when .learn()
is called, the graph visualization was disabled as not so easy to read (see #30 )
Hi, again. I'm now using PPO
and I get the same result. Could there be any other reason that the log file is not generated? I am able to get callbacks working for evaluation and saving, but I still can't get the Tensorboard log to be generated.
Did you try with the code i wrote? (at least as a sanity check)
Did you try with the code i wrote? (at least as a sanity check)
Yup, I don't get a Tensorboard file with that either. :/ Could it be something to do with other libraries?
Got it! I needed to uninstall my current version, then reinstall with the [extra]
tag.
Thanks for your help.
Hello, your solution does not sort the issue on my side.
I was running a training script (SB3/PPO) on google Colab (installing SB3 without the [EXTRA] tag) and everything was working perfectly. Then I migrated to Microsoft Azure VM and impossible to get the tensorboard logs written to the tensorboard_log
directory although my script is exactly the same. I tried installing SB3 with the [EXTRA] tag but it did not help.
Here are the relevant parts of the script which is located in the root_path folder:
root_path = os.sys.path[0]
logs_path = os.path.join(root_path, 'logs')
logs_path_eval = os.path.join(root_path, 'logs_eval')
env = make_vec_env('my_custom_env', n_envs=16, env_kwargs=env_kw)
eval_env = gym.make('my_custom_env', **env_kw)
model = PPO(
policy = CnnPolicy,
env = env,
verbose = 1,
tensorboard_log = logs_path,
create_eval_env = True,
**ppo_kw,
)
model.learn(
total_timesteps = 100000,
tb_log_name = 'agent_test',
eval_env = eval_env,
eval_freq = 10000,
n_eval_episodes = 10,
eval_log_path = logs_path_eval,
reset_num_timesteps = False,
)
Any idea?
@johanmonard
You are storing logs to directory under os.sys.path[0]
, not local directory. I guess it is common for .
to be first item in PATH on Linux, but there is no rule that says it has to be so. On my Windows 10 os.sys.path[0]
is C:\\Users\\Anssi\\AppData\\Local\\Programs\\Python\\Python37\\Scripts
.
Change first line to root_path = "."
or root_path = os.curdir()
and that should do the trick.
@Miffyli
Still the same problem, note that even with os.sys.path[0]
, other folders and path I am using are working perfectly:
agents_path = os.path.join(root_path, 'agents')
logs_path = os.path.join(root_path, 'logs')
logs_path_eval = os.path.join(root_path, 'logs_eval')
I save agents periodically in agents_path
and it works, and use logs_path_eval
in model.learn(eval_log_path=logs_path_eval)
and it does save the best model there so there must be something else but I don't any error message...
Ah, sorry that I missed this!
Hmm I would confirm that the code first works on local machines (outside colab). If that does not work it might be a bug somewhere (or something else in the code interferes with things). If the directory were not writeable it should throw an exception. If code works locally then something in Azure is interfering, and that sadly goes beyond these issues :/.
Hello, your solution does not sort the issue on my side.
I was running a training script (SB3/PPO) on google Colab (installing SB3 without the [EXTRA] tag) and everything was working perfectly. Then I migrated to Microsoft Azure VM and impossible to get the tensorboard logs written to the
tensorboard_log
directory although my script is exactly the same. I tried installing SB3 with the [EXTRA] tag but it did not help.Here are the relevant parts of the script which is located in the root_path folder:
root_path = os.sys.path[0] logs_path = os.path.join(root_path, 'logs') logs_path_eval = os.path.join(root_path, 'logs_eval') env = make_vec_env('my_custom_env', n_envs=16, env_kwargs=env_kw) eval_env = gym.make('my_custom_env', **env_kw) model = PPO( policy = CnnPolicy, env = env, verbose = 1, tensorboard_log = logs_path, create_eval_env = True, **ppo_kw, ) model.learn( total_timesteps = 100000, tb_log_name = 'agent_test', eval_env = eval_env, eval_freq = 10000, n_eval_episodes = 10, eval_log_path = logs_path_eval, reset_num_timesteps = False, )
Any idea?
What version of tensorboard do you have?
The issue is here. SB3 tries to import tensordboard, and if it is not installed, it just silently ignores it. There should definitely be a log warning if the import failed and the user tries to write a tensorboard log. Just make sure tensorboard is actually installed on your system.
Good catch! User definitely should be informed about this (tensorboard_log_path is not None but SummaryWriter is None).
@araffin Maybe something you could fix in #469 as it touches the same parts of code?
@araffin Maybe something you could fix in #469 as it touches the same parts of code?
done ;) It will now throw an ImportError
.
Describe the bug In Stable Baseline, if I train
sac.SAC
withtensorboard_log='./logs/'
, I get a Tensorboard log in./logs/SAC_1/
. But, in Stable Baselines 3, with the same keyword argument, a Tensorboard log is not generated.Code example If I run my training script with
I get a log at
./logs/SAC_1/events.out.tfevents.1594902758.machinename
.But, if I run the equivalent in this updated version as so
the previous log generation doesn't happen.
System Info
pip
if possible