Extend experiment table metrics options to include "best" checkpoint

InonS commented 1 year ago

Proposal Summary

Extend experiment table metrics checkboxes to include best checkpoint.

Motivation

Many Deep Learning training regularization strategies build on early-stopping with patience using epoch checkpoints. In such cases, the framework will save the output model taken at the best checkpoint, and therefore the metrics to be displayed on the leaderboard should be reported at the same epoch (neither min, nor max, nor last).

Related Discussion

Circumventing the issue by manually uploading best metrics to some generic configuration field (e.g. see #186 ) is an inferior UX.

Credit: @adar21a

Edit: Related but not identical to #568

ainoam commented 1 year ago

Thanks for suggesting @InonS .

This is basically addressing the same use case as in #568, correct?

InonS commented 1 year ago

Thanks for suggesting @InonS .

This is basically addressing the same use case as in #568, correct?

@ainoam not a perfect match, as I understand it. I'm looking for an integration with an Early Stopping callback, in particular. #568 (if I'm not wrong) is looking to present more than one metric, aligned on the min/max/last of the first one.

ainoam commented 1 year ago

I'm looking for an integration with an Early Stopping callback, in particular.

I'm not aware of any standard early stopping callback? The easiest way is to use something similar to the following snippet in any early Stopping callback you are using:

from clearml import Logger

Logger.current_logger().report_scalar(title="metric", series="best", value=123, iteration=k)

Notice this is a singleton, you can always call it, no need to pass the Task/Logger class.

When using the ClearML Hyperparameter optimization, you can find the target metric in the HPO Task, as well as the analysis of the hyperparameters.

WDYT?

ainoam commented 1 year ago

@InonS Any thoughts on the proposed workflow suggested?

InonS commented 1 year ago

I'm looking for an integration with an Early Stopping callback, in particular.

I'm not aware of any standard early stopping callback? The easiest way is to use something similar to the following snippet in any early Stopping callback you are using:
from clearml import Logger

Logger.current_logger().report_scalar(title="metric", series="best", value=123, iteration=k)
Notice this is a singleton, you can always call it, no need to pass the Task/Logger class.

When using the ClearML Hyperparameter optimization, you can find the target metric in the HPO Task, as well as the analysis of the hyperparameters.

WDYT?

Hi, @ainoam !

Again, I'm not sure we're talking about the same thing.

The issue we're having is that when we log a scalar, we can choose to present in the UI a column of the min, max, or last value. When working with Early Stopping with Patience, this means that the minimal loss was given some patience epochs before the last. Let's call this epoch the best_epoch. Now we would like the UI to compare several experiments based on some other metric (not the loss), but each experiment has a different best_epoch. What we want is metric[best_epoch], so to say. The problem is that the best_epoch is not the last epoch, and metric[best_epoch] may be none of max(metric), min(metric), or metric[-1] (i.e. "last").

If this is still confusing, please let me know, and we can pick up communications some other way.

ainoam commented 1 year ago

Thanks for clarifying @InonS,

I think we are actually slowly aligning :) The suggestion above was that you "copy" the value of metric[best_epoch] into a new series: metric/best (or, if you prefer, you can go with a new metric: metric_best), which will only have a single value and hence min/max/last values are actually all the same (BTW, if the iteration is of less import, you can use Logger.report_single_value()) This means, that in the UI you will have a new column option for that new metric.

Does this make sense?

InonS commented 2 weeks ago

Thanks for clarifying @InonS,

I think we are actually slowly aligning :) The suggestion above was that you "copy" the value of metric[best_epoch] into a new series: metric/best (or, if you prefer, you can go with a new metric: metric_best), which will only have a single value and hence min/max/last values are actually all the same (BTW, if the iteration is of less import, you can use Logger.report_single_value()) This means, that in the UI you will have a new column option for that new metric.

Does this make sense?

Sorry for the delay, @ainoam

If I properly understood your suggestion, it would require us to manually detect which was the best_epoch and then manually log the metric at that epoch. I was hoping that ClearML SDK could "automagically" do this for me (the user).

class EarlyStoppingTriggerCallback(keras.callbacks.Callback):
    """
    Takes an `EarlyStopping` callback and a `Metric` as a ctor args and triggers if the training ended due to early stopping
    """
    def __init__(self, early_stopping_callback, metric, is_val_metric=True):
        super().__init__()
        self.early_stopping_callback = early_stopping_callback
        self.metric_name = ("val_" if self.is_val_metric else "") + self.metric.name

    def on_train_end(self, logs=None):
        if self.early_stopping_callback.stopped_epoch > 0:
            Logger.report_single_value("early_stopping_epoch", self.early_stopping_callback.stopped_epoch + 1)
            Logger.report_single_value(f"early_stopping({self.metric_name})", logs[self.metric_name])

allegroai / clearml