Open InonS opened 1 year ago
Thanks for suggesting @InonS .
This is basically addressing the same use case as in #568, correct?
Thanks for suggesting @InonS .
This is basically addressing the same use case as in #568, correct?
@ainoam not a perfect match, as I understand it. I'm looking for an integration with an Early Stopping callback, in particular. #568 (if I'm not wrong) is looking to present more than one metric, aligned on the min/max/last of the first one.
I'm looking for an integration with an Early Stopping callback, in particular.
I'm not aware of any standard early stopping callback? The easiest way is to use something similar to the following snippet in any early Stopping callback you are using:
from clearml import Logger
Logger.current_logger().report_scalar(title="metric", series="best", value=123, iteration=k)
Notice this is a singleton, you can always call it, no need to pass the Task/Logger class.
When using the ClearML Hyperparameter optimization, you can find the target metric in the HPO Task, as well as the analysis of the hyperparameters.
WDYT?
@InonS Any thoughts on the proposed workflow suggested?
I'm looking for an integration with an Early Stopping callback, in particular.
I'm not aware of any standard early stopping callback? The easiest way is to use something similar to the following snippet in any early Stopping callback you are using:
from clearml import Logger Logger.current_logger().report_scalar(title="metric", series="best", value=123, iteration=k)
Notice this is a singleton, you can always call it, no need to pass the Task/Logger class.
When using the ClearML Hyperparameter optimization, you can find the target metric in the HPO Task, as well as the analysis of the hyperparameters.
WDYT?
Hi, @ainoam !
Again, I'm not sure we're talking about the same thing.
The issue we're having is that when we log a scalar, we can choose to present in the UI a column of the min, max, or last value. When working with Early Stopping with Patience, this means that the minimal loss was given some patience
epochs before the last. Let's call this epoch the best_epoch
. Now we would like the UI to compare several experiments based on some other metric (not the loss), but each experiment has a different best_epoch
. What we want is metric[best_epoch]
, so to say. The problem is that the best_epoch
is not the last epoch, and metric[best_epoch]
may be none of max(metric)
, min(metric)
, or metric[-1]
(i.e. "last").
If this is still confusing, please let me know, and we can pick up communications some other way.
Thanks for clarifying @InonS,
I think we are actually slowly aligning :)
The suggestion above was that you "copy" the value of metric[best_epoch]
into a new series: metric/best
(or, if you prefer, you can go with a new metric: metric_best
), which will only have a single value and hence min/max/last values are actually all the same (BTW, if the iteration is of less import, you can use Logger.report_single_value()
)
This means, that in the UI you will have a new column option for that new metric.
Does this make sense?
Thanks for clarifying @InonS,
I think we are actually slowly aligning :) The suggestion above was that you "copy" the value of
metric[best_epoch]
into a new series:metric/best
(or, if you prefer, you can go with a new metric:metric_best
), which will only have a single value and hence min/max/last values are actually all the same (BTW, if the iteration is of less import, you can useLogger.report_single_value()
) This means, that in the UI you will have a new column option for that new metric.Does this make sense?
Sorry for the delay, @ainoam
If I properly understood your suggestion, it would require us to manually detect which was the best_epoch
and then manually log the metric at that epoch. I was hoping that ClearML SDK could "automagically" do this for me (the user).
class EarlyStoppingTriggerCallback(keras.callbacks.Callback):
"""
Takes an `EarlyStopping` callback and a `Metric` as a ctor args and triggers if the training ended due to early stopping
"""
def __init__(self, early_stopping_callback, metric, is_val_metric=True):
super().__init__()
self.early_stopping_callback = early_stopping_callback
self.metric_name = ("val_" if self.is_val_metric else "") + self.metric.name
def on_train_end(self, logs=None):
if self.early_stopping_callback.stopped_epoch > 0:
Logger.report_single_value("early_stopping_epoch", self.early_stopping_callback.stopped_epoch + 1)
Logger.report_single_value(f"early_stopping({self.metric_name})", logs[self.metric_name])
Proposal Summary
Extend experiment table metrics checkboxes to include
best
checkpoint.Motivation
Many Deep Learning training regularization strategies build on early-stopping with patience using epoch checkpoints. In such cases, the framework will save the output model taken at the
best
checkpoint, and therefore the metrics to be displayed on the leaderboard should be reported at the same epoch (neithermin
, normax
, norlast
).Related Discussion
Circumventing the issue by manually uploading
best
metrics to some generic configuration field (e.g. see #186 ) is an inferior UX.Credit: @adar21a
Edit: Related but not identical to #568