automl / amltk

A build-it-yourself AutoML Framework
https://automl.github.io/amltk/
BSD 3-Clause "New" or "Revised" License
62 stars 4 forks source link

feat(sklearn): Provide a standard CVEvaluator #244

Closed eddiebergman closed 8 months ago

eddiebergman commented 8 months ago

This PR mainly introduces the amltk.sklearn.CVEvaluator. This is something that can create a Task[[Trial, Node], Trial.Report] that can be optimized against for a proto-typical sklearn setup.

from amltk.sklearn import CVEvaluation
from amltk.pipeline import Component, request
from amltk.optimization import Metric
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import get_scorer

pipeline = Component(
    RandomForestClassifier,
    config={"random_state": request("random_state")},
    space={"n_estimators": (10, 100), "critera": ["gini", "entropy"]},
)
evaluator = CVEvaluation(
    X,
    y,
    cv=3,
    additional_scorers={"f1": get_scorer("f1")},
    store_models=False,
    train_score=True,
)

history = pipeline.optimize(
    target=evaluator,
    metrics=Metric("accuracy", minimize=False, bounds=(0, 1)),
    n_workers=4,
)
print(history.df())

Namely its parameters features:

What this gains is correctly setting up tasks, such as serializing data to be passed to workers, managing memory, i.e. not holding all splits/models in memory at once, as well as interacting with sklearn properly, such as setting seeds.


Additional


TODOs