Closed stelladk closed 1 year ago
Same problem actually happend to me - just with the TD3/PPO model .
My specs are smiliar :
installed sb3 with pip install "stable_baselines3[extra]>=2.0.0a9"
Custom environment that is very simple - multi agent maze only using numpy
.
Logging to tensorboard with A2C for example worked fine.
My environment uses Box action space and Dictionary observation space.
I can't reproduce what you describe. I'm using exactly the same code (I've just added train_SAC_policy("Pendulum-v1", 10_000)
) and I get a correct tensorboard logging
and
% ls -alt models/SAC_Pendulum-v1_t10000/SAC_1
total 16
drwxr-xr-x 5 quentingallouedec staff 160 Jun 15 10:14 ..
-rw-r--r-- 1 quentingallouedec staff 5404 Jun 15 10:14 events.out.tfevents.1686816821.MacBook-Pro-de-Quentin.local.8647.0
drwxr-xr-x 3 quentingallouedec staff 96 Jun 15 10:13 .
It has to do with the custom environment then? Is there a function we need to override for tensorboard integration with custom envs? I also notice we have a different Python version. @ronTohar1 what Python version are you using?
I also notice we have a different Python version. @ronTohar1 what Python version are you using?
I've just tried with Python3.8, it works as well.
Is there a function we need to override for tensorboard integration with custom envs?
No. Have you check (with the env checker) your custom env? Does it work with Pendulum in your setting?
my guess is that it is related to you custom env and that you also don't have any info in the terminal. The reason is that SAC logs things every 4 episodes by default, whereas PPO/A2C logs every n steps. A solution is to force logging using a callback (see documentation).
EDIT: my guess is that you have very long episodes
@araffin , ok I see I guessed it was related to the datatype of the tensors, but In fact I bet it is related to the done
value in step
.
It works fine of my custom env as I defined a proper done
.
@stelladk Could you check if done
is well defined then ? Like set to true at some point?
Then if it is the reason, it would be nice to have the env checker to check done
is well implemented. But this is not possible I guess?
Thanks,
I am using Python 3.11.4 btw.
What do you mean by well defined done? Do you mean that eventually terminated or truncated
becomes true?
Also what bothers me is how can it be that A2C logs fine and PPO doesn't log at all?
I can send a link to my repository with the code and the environment as it is very simple and will be easy to reproduce results (env is just a nxn board with agents and goals)
Indeed the environment does not change the terminated or truncated flags.
However, I also noticed that Pendulum-v1
on Gymnasium also does not change it (return self._get_obs(), -costs, False, False, {}
) and it is also not working in my case. Environment MountainCarContinuous-v0
has an actual break condition and it logs normally on tensorboard.
Just so I make it clear, my termination and truncation values are well defined and return from step function as expected - true when finished or false otherwise
Indeed the environment does not change the terminated or truncated flags. However, I also noticed that
Pendulum-v1
on Gymnasium also does not change it (return self._get_obs(), -costs, False, False, {}
) and it is also not working in my case. EnvironmentMountainCarContinuous-v0
has an actual break condition and it logs normally on tensorboard.
pendulum like most gym env have a time out (timelimit wrapper), it's defined when registering the env.
Well I found a what caused my problem - not seeing any logs on tensorboard nor on screen (stdout). calling PPO learn like this:
agent.learn(total_timesteps=num_steps, log_interval=100, tb_log_name=name)
This worked with A2C fine and I saw all the logs. PPO didn't log as I said until I changed the code to:
agent.learn(total_timesteps=num_steps, log_interval=1, tb_log_name=name)
or removing the log_interval
completely.
I dont know why but I guess that PPO logs every 2048 steps?
So this helped me.
Well I found a what caused my problem - not seeing any logs on tensorboard nor on screen (stdout). calling PPO learn like this:
agent.learn(total_timesteps=num_steps, log_interval=100, tb_log_name=name)
This worked with A2C fine and I saw all the logs. PPO didn't log as I said until I changed the code to:
agent.learn(total_timesteps=num_steps, log_interval=1, tb_log_name=name)
or removing the
log_interval
completely.I dont know why but I guess that PPO logs every 2048 steps?
So this helped me.
Removing log_interval
, defaults to log_interval=1
.
below the code of the concerned loop, why does 100 would fail in your case ?
which value did you set for total_timesteps
? could you identify if you get through the if
condition line 268 ?
Sorry for the late response.
The value for total_timesteps
was 100_000.
What I think the problem is if I put log_interval=1
, the first output I get is as follows:
Now I don't know why it says 2048 timesteps at the first logging, but I think it tries to log after 2048 * log_interval times, because similarly when log_interval=2
the same happens just with 4096 steps at the first output logged onto the screen.
As you asked I did check and I get through the if
codition in line 268 as you would think. The only presence of the number 2048 is, as I could see in the debugger, is the n_rollout_steps
variable.
This is I guess why only after 2048 steps I can see the output? I am not sure why it is exactly what happens but this is my guess.
I think that the self.collect_rollouts(...)
just does 2048 steps because of the n_rollout_steps
variable. That is why probably it takes 2048 steps for 1 iteration.
🐛 Bug
When using tensorboard integration with SAC no data are written on the events file. The model training is done without problem and the metrics are correctly stored in
self.logger.name_to_value
dictionary of the model. However the events.out.tfevents file produced for tensorboard does not contain any of those data and I have checked that its size is very small, approximately 88 bytes. The same issue does not happen when using the PPO algorithm with the exact same code and configuration. It normally produces the events.out.tfevents file with 5649 bytes and the metrics are shown on tensorboard.I have seen a similar issue here: https://github.com/DLR-RM/stable-baselines3/issues/1419 but it is using an older version. I have found a workaround for now using a callback and I log the train metrics manually using the
self.logger.name_to_value
dictionary. However this is a very strange issue. I am using a custom gymnasium environment and an alpha version of stable-baselines3 to be compatible with gymnasium. Thank you for maintaining this library!Code example
Relevant log output / Error message
System Info
Python 3.8.16 Stable-Baselines3 2.0.0a13 installed via pip install sb3_contrib Gymnasium 0.28.1 PyTorch: 2.0.1+cu117 Tensorflow 2.12.0 Tensorboard 2.12.3 Numpy 1.24.3
Conda environment and lbraries installed with: conda 23.3.1 pip 23.1.2
Checklist