ETA444 / datasafari

DataSafari simplifies complex data science tasks into straightforward, powerful one-liners.
https://datasafari.dev
GNU General Public License v3.0
2 stars 0 forks source link

Implement new calculator util: calculate_composite_score() for predict_ml() pipeline #103

Closed ETA444 closed 5 months ago

ETA444 commented 6 months ago

Composite Score Calculation

The composite score approach is used in the model_recommendation_core(), which is part of the predict_ml() pipeline for recommending the best n model(s).

It aims to synthesize multiple scoring metrics into a single metric that can be used to compare and rank models. This is particularly useful when you have multiple criteria that you consider important for your model's performance, and these criteria might have different scales or directions (i.e., for some metrics, higher is better, while for others, lower is better).

Here’s a breakdown of the calculation:

Composite Score Calculation Formula

Given a set of metrics $M$, where each metric $m \in M$ has a score $s_m$ and a weight $w_m$, the composite score $C$ for a model can be calculated as:

$$ C = \frac{\sum_{m \in M} (w_m \cdot \text{adj}(sm))}{\sum{m \in M} w_m} $$

Where:

This formula allows for a weighted synthesis of multiple performance metrics into a single, normalized score that facilitates direct comparison of models based on a balanced assessment of their performance across the prioritized criteria.

Concern: Handling Metrics Where Lower is Better

The concern about metrics where a lower value indicates better performance (like RMSE) is in my mind. However, the composite score calculation can accommodate such metrics through inversion or negation, ensuring that all metrics effectively operate in a "higher is better" framework for the composite score to be meaningful and consistent.

When integrating such scores into the composite score calculation, the key is to ensure all metrics are on a consistent scale and direction so that the composite score effectively reflects the model's overall performance according to the prioritized criteria.

This approach allows for a nuanced comparison of models, balancing the trade-offs between different performance metrics in a way that aligns with the specific objectives and preferences for the modelling task at hand.

Review of Metrics' Adherence to 'Higher is Better' Framework

Classification Metrics

Regression Metrics

ETA444 commented 5 months ago

Implementation Summary

calculate_composite_score() is a utility function that aggregates multiple evaluation metrics into a single score by weighting each metric according to its importance. This weighted approach facilitates a balanced and comprehensive evaluation of model performance across various criteria, enhancing decision-making in model selection within the predict_ml() pipeline.

Code Breakdown

if not isinstance(scores, dict) or not isinstance(metric_weights, dict):
    raise TypeError("Both 'scores' and 'metric_weights' must be dictionaries.")
if not scores or not metric_weights:
    raise ValueError("'scores' and 'metric_weights' cannot be empty.")
missing_metrics = set(scores.keys()) - set(metric_weights.keys())
if missing_metrics:
    raise ValueError(f"Missing weights for metrics: {', '.join(missing_metrics)}")
composite_score = sum(score * metric_weights.get(metric, 0) for metric, score in scores.items()) / sum(metric_weights.values())
try:
    composite_score = sum(score * metric_weights.get(metric, 0) for metric, score in scores.items()) / sum(metric_weights.values())
except Exception as e:
    raise ValueError(f"Error in calculating composite score: {e}")

Example Usage

scores = {'Accuracy': 0.95, 'Precision': 0.90}
metric_weights = {'Accuracy': 5, 'Precision': 1}
composite_score = calculate_composite_score(scores, metric_weights)
print(f"Composite Score: {composite_score:.2f}")

Link to Full Code