feat(sklearn): Provide a standard CVEvaluator

This PR mainly introduces the amltk.sklearn.CVEvaluator. This is something that can create a Task[[Trial, Node], Trial.Report] that can be optimized against for a proto-typical sklearn setup.

from amltk.sklearn import CVEvaluation
from amltk.pipeline import Component, request
from amltk.optimization import Metric
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import get_scorer

pipeline = Component(
    RandomForestClassifier,
    config={"random_state": request("random_state")},
    space={"n_estimators": (10, 100), "critera": ["gini", "entropy"]},
)
evaluator = CVEvaluation(
    X,
    y,
    cv=3,
    additional_scorers={"f1": get_scorer("f1")},
    store_models=False,
    train_score=True,
)

history = pipeline.optimize(
    target=evaluator,
    metrics=Metric("accuracy", minimize=False, bounds=(0, 1)),
    n_workers=4,
)
print(history.df())

Namely its parameters features:

Sensible defaults for splitting based on strategy: Literal["holdout", "cv"] or pass a custom splitter.
Options to train_score: bool = False or store_models: bool = False
Pass additional_scorers: dict[str, _Scorer] for metrics to track other than optimization ones attached to the Trial.
params: dict[str, Any] that use sklearns new metadata routing for things like sample_weight and scorer params.
- This uses scikit-learn>=1.4 and likely means this will enforce a lower bound. I do not want to maintain backwards compatibility.
task_hint: bool | None = None to hint on the task type. This comes up in the AutoML Benchmark where sklearn incorrectly identifies some targets as regression.

What this gains is correctly setting up tasks, such as serializing data to be passed to workers, managing memory, i.e. not holding all splits/models in memory at once, as well as interacting with sklearn properly, such as setting seeds.

Additional

Move away from StoredValue to just having a Stored[T] which you can call load() on. Useful for situations where you don't care about where, just give me the T

TODOs

Tests parameters like groups, sample weights and scorers. Will move this to its own issue for the sake of marching onwards.
Test clustering, theoretically this shouldn't be problematic except for the identify task type part. However the cross_validate doesn't care

automl / amltk

feat(sklearn): Provide a standard CVEvaluator #244

Additional

TODOs