automl / amltk

A build-it-yourself AutoML Framework
https://automl.github.io/amltk/
BSD 3-Clause "New" or "Revised" License
62 stars 4 forks source link

refactor(Metric, Trial): Cleanup of metrics and `Trial` #242

Closed eddiebergman closed 8 months ago

eddiebergman commented 8 months ago

This PR is namely just refactoring the Metric and where it was used to make it easier to go directly from Metric to sklearn.metric._scorer._Scorer or _MultiScorer in the next steps. Notably, you can attach a fn to a Metric and call as_sklearn_scorer(...) -> _Scorer on it. This also applies to the new MetricCollection(Mapping[str, Any]) where you can call as_sklearn_scorer(...) -> _MultiMetricScorer.

This enables the following improvement for sklearn integration:

def evaluate(trial: Trial, pipeline: Node) -> Trial.Report:
    model = (
        pipeline.configure(trial.config).build("sklearn").set_output(transform="pandas")
    )
    X, y, _, _ = dataset.get_data(target=dataset.default_target_attribute)
    with trial.profile("fit"):
        model.fit(X, y)

    # Convert the metrics to an sklearn scorer
    # Works seemlessly with single or multiple metrics defined
    # Could also do `acc = trial.metrics["accuracy"].as_sklearn_scorer()`
    scorer = trial.metrics.as_sklearn_scorer()
    with trial.profile("scoring"):
        scores = scorer(model, X, y)

    return trial.success(**scores)

# Custom metric
def custom_acc(y_pred, y_true) -> float:
    return -(y_pred == y_true).mean()

# Will use `get_scorer` to get the scorer
metric = Metric("accuracy", bounds=(0, 1), minimize=False)

# Will use `make_scorer` using the information here
metric_2 = Metric("custom", bounds=(0, 1), minimize=True, fn=custom_acc)

pipeline = (
    Component(OrdinalEncoder)
    >> Component(
        RandomForestClassifier,
        space={"n_estimators": (10, 100)},
        config={"criterion": "gini"},
    )
)
history = pipeline.optimize(
    evaluate,
    max_trials=5,
    metric=[metric, metric_2],
)

print(history.df(profiles=False))

Feel free to skip most of the code review as it's mostly just refactoring. There will be more work towards make this workflow as streamlined as possible!

TODO:


While doing so, it also became more apparent that the Trial class is a bit too dynamic and had to much early feature scope. Part of this was also to reduce the attributes available on a Report, make the actual values always with the metric names as a dict and remove the Metric.Value class.