Open tobiabir opened 1 year ago
Hello, have you considered callbacks as an alternative? (see doc and section on tensorboard) they should allow you to log every n steps or every k iterations.
Hi.
Thank you for the quick answer and for that pointer!
Indeed, with a custom callback, logging on step-basis can be done. I quickly tested this on SAC with the following callback.
class TensorboardCallback(sb3.common.callbacks.BaseCallback):
def __init__(self, log_interval):
super().__init__()
self.log_interval = log_interval
def _on_step(self):
if self.model.num_timesteps % self.log_interval == 0:
self.model._dump_logs()
Some minor draw backs remain with this approach:
log_interval
in learn
to np.inf
._on_training_end
.Would you still consider some of the proposed changes? I think it could clean things up and improve consistency.
At least I think the documentation of log_interval
should be fixed as for on-policy algorithms it is currently incorrect.
Adding a callback to do step based logging to the collection provided by sb3 would be a good addition i think. Logging more often shouldn't be a problem (or solvable as you describe).
Could you elaborate on your vision for this? Like what set of features should this callback support? Should it be a minimal callback for only step based logging or should it be more general logging callback supporting all sorts of logging applications?
Could you elaborate on your vision for this?
Have a simple LogEveryNSteps
callback that calls self.logger.dump()
every n steps (n calls to env.step()
, it might correspond to more than one steps when using multiple envs, as shown in the doc).
general logging callback supporting all sorts of logging applications?
We leave general purpose/custom callbacks to the user.
π Feature
At the time of writing the logging interval is controlled by the
log_interval
argument of thelearn
method. Permitted are integers. InOnPolicyAlgorithm
this is the number of rounds (environment interaction + training steps) and inOffPolicyAlgorithm
the number of episodes between logging.Since episodes in general do not have fixed length or an end at all, logging on an episode-basis is not always practical (see Motivation). Could we add the capability to log on a step-based interval?
Motivation
The main motivation is experiment tracking. It is good practice to run experiments multiple times with different random seeds and display training plots with confidence intervals or min and max. If you want to plot against environment steps (e.g. when you are interested sample complexity) you can't really do that properly if the logs are not done on a step-basis. This is because with variable episode length the logs will not be aligned and it is difficult to compute the confidence intervals (e.g. what should the interval at step x be if you have a log at step x - 3 and x + 5). So it would be great if this could be added.
Note also that the documentation is currently stating for all algorithms that
log_interval
is "The number of episodes before logging.", which is not true for on-policy algorithms. Issue #725 is related to that.Pitch
I propose we change the
OffPolicyAlgorithm
case to be the same as theOnPolicyAlgorithm
case. Then logging on a step-basis can be done using atrain_freq
on a step-basis. This has the additional benefit of more consistency between on-policy and off-policy algorithms.While we are at it I also propose to move logging to after training in both
OnPolicyAlgorithm
andOffPolicyAlgorithm
. That way we can get all the information of one round in the same log. As it is the information of the last training steps will not be logged.Alternatives
The least invasive (to sb3) alternative is to modify the environments of interest to have fixed episode length. However, this might not be practical for all environments and seems ugly.
Another alternative would be to allow
log_interval
to be a tuple (e.g. (4, "episodes")) liketrain_freq
. This also seems ugly.Update: Custom callback. See discussion below.
Additional context
No response
Checklist