automl / amltk

A build-it-yourself AutoML Framework
https://automl.github.io/amltk/
BSD 3-Clause "New" or "Revised" License
68 stars 6 forks source link

refactor(Metric, Trial): Cleanup of metrics and `Trial` #242

Closed eddiebergman closed 10 months ago

eddiebergman commented 10 months ago

This PR is namely just refactoring the Metric and where it was used to make it easier to go directly from Metric to sklearn.metric._scorer._Scorer or _MultiScorer in the next steps. Notably, you can attach a fn to a Metric and call as_sklearn_scorer(...) -> _Scorer on it. This also applies to the new MetricCollection(Mapping[str, Any]) where you can call as_sklearn_scorer(...) -> _MultiMetricScorer.

This enables the following improvement for sklearn integration:

def evaluate(trial: Trial, pipeline: Node) -> Trial.Report:
    model = (
        pipeline.configure(trial.config).build("sklearn").set_output(transform="pandas")
    )
    X, y, _, _ = dataset.get_data(target=dataset.default_target_attribute)
    with trial.profile("fit"):
        model.fit(X, y)

    # Convert the metrics to an sklearn scorer
    # Works seemlessly with single or multiple metrics defined
    # Could also do `acc = trial.metrics["accuracy"].as_sklearn_scorer()`
    scorer = trial.metrics.as_sklearn_scorer()
    with trial.profile("scoring"):
        scores = scorer(model, X, y)

    return trial.success(**scores)

# Custom metric
def custom_acc(y_pred, y_true) -> float:
    return -(y_pred == y_true).mean()

# Will use `get_scorer` to get the scorer
metric = Metric("accuracy", bounds=(0, 1), minimize=False)

# Will use `make_scorer` using the information here
metric_2 = Metric("custom", bounds=(0, 1), minimize=True, fn=custom_acc)

pipeline = (
    Component(OrdinalEncoder)
    >> Component(
        RandomForestClassifier,
        space={"n_estimators": (10, 100)},
        config={"criterion": "gini"},
    )
)
history = pipeline.optimize(
    evaluate,
    max_trials=5,
    metric=[metric, metric_2],
)

print(history.df(profiles=False))

Feel free to skip most of the code review as it's mostly just refactoring. There will be more work towards make this workflow as streamlined as possible!

TODO:


While doing so, it also became more apparent that the Trial class is a bit too dynamic and had to much early feature scope. Part of this was also to reduce the attributes available on a Report, make the actual values always with the metric names as a dict and remove the Metric.Value class.