refactor(Metric, Trial): Cleanup of metrics and `Trial`

This PR is namely just refactoring the Metric and where it was used to make it easier to go directly from Metric to sklearn.metric._scorer._Scorer or _MultiScorer in the next steps. Notably, you can attach a fn to a Metric and call as_sklearn_scorer(...) -> _Scorer on it. This also applies to the new MetricCollection(Mapping[str, Any]) where you can call as_sklearn_scorer(...) -> _MultiMetricScorer.

This enables the following improvement for sklearn integration:

def evaluate(trial: Trial, pipeline: Node) -> Trial.Report:
    model = (
        pipeline.configure(trial.config).build("sklearn").set_output(transform="pandas")
    )
    X, y, _, _ = dataset.get_data(target=dataset.default_target_attribute)
    with trial.profile("fit"):
        model.fit(X, y)

    # Convert the metrics to an sklearn scorer
    # Works seemlessly with single or multiple metrics defined
    # Could also do `acc = trial.metrics["accuracy"].as_sklearn_scorer()`
    scorer = trial.metrics.as_sklearn_scorer()
    with trial.profile("scoring"):
        scores = scorer(model, X, y)

    return trial.success(**scores)

# Custom metric
def custom_acc(y_pred, y_true) -> float:
    return -(y_pred == y_true).mean()

# Will use `get_scorer` to get the scorer
metric = Metric("accuracy", bounds=(0, 1), minimize=False)

# Will use `make_scorer` using the information here
metric_2 = Metric("custom", bounds=(0, 1), minimize=True, fn=custom_acc)

pipeline = (
    Component(OrdinalEncoder)
    >> Component(
        RandomForestClassifier,
        space={"n_estimators": (10, 100)},
        config={"criterion": "gini"},
    )
)
history = pipeline.optimize(
    evaluate,
    max_trials=5,
    metric=[metric, metric_2],
)

print(history.df(profiles=False))

Feel free to skip most of the code review as it's mostly just refactoring. There will be more work towards make this workflow as streamlined as possible!

TODO:

Implement tests for sklearn functionality of metrics. Right now they were just manually tested.

While doing so, it also became more apparent that the Trial class is a bit too dynamic and had to much early feature scope. Part of this was also to reduce the attributes available on a Report, make the actual values always with the metric names as a dict and remove the Metric.Value class.

Now, the preferred way to create a Trial one is with create(), although normally an optimizer will just make one for a user and it's not manually specified. This prevents them being in a slightly too dynamic of a state.
I removed the notion of where= for store() and retrieve(). They will always have a PathBucket attached to them and we can revisit this if needed.
The metrics is now a MetricCollection(Mapping[str, Metric]), essentially a dict where you can go trial.metrics["accuracy"].
This matches the Report as well with report.metrics["accuracy"]. You can also access the raw values reported with report.values["accuracy"].
- This removed about 2 fields extraneous fields from report which was nice.

automl / amltk

refactor(Metric, Trial): Cleanup of metrics and `Trial` #242