ETA444 / datasafari

DataSafari simplifies complex data science tasks into straightforward, powerful one-liners.
https://datasafari.dev
GNU General Public License v3.0
2 stars 0 forks source link

Develop model_recommendation_core() for predict_ml() #110

Closed ETA444 closed 4 months ago

ETA444 commented 5 months ago

Title: Implement Model Recommendation Core for Automated Machine Learning Pipeline - backend of predict_ml()

Description: The model_recommendation_core() function aims to automate the process of recommending top-performing machine learning models based on composite scores derived from multiple evaluation metrics. This enhancement proposal outlines the functionality and design of the model recommendation core, which is an integral component of an automated machine learning pipeline.

Proposed Changes:

Expected Outcome: Upon implementation, the model_recommendation_core() function will serve as a valuable tool for automating model selection in machine learning workflows. By evaluating models across multiple metrics and providing recommendations based on composite scores, this enhancement will streamline the model selection process, reduce manual effort, and improve the effectiveness of machine learning experiments.

Additional Context: The proposed model recommendation core addresses the growing demand for automation and efficiency in machine learning model selection. By leveraging composite scores and weighted metrics, it enables a nuanced comparison of models and facilitates the identification of top-performing candidates. This enhancement aligns with our commitment to advancing automation and productivity in data science workflows.

ETA444 commented 4 months ago

Implementation Summary

model_recommendation_core() is a sophisticated function designed to recommend the most suitable machine learning models based on a composite score. This score synthesizes multiple evaluation metrics, each weighted according to specified priorities. This function is key to selecting top-performing models tailored to specific analysis needs in the predict_ml() pipeline.

Code Breakdown

def calculate_composite_score(scores: dict, metric_weights: dict) -> float:
    if not isinstance(scores, dict) or not isinstance(metric_weights, dict):
        raise TypeError("Both 'scores' and 'metric_weights' must be dictionaries.")
    if not scores or not metric_weights:
        raise ValueError("'scores' and 'metric_weights' cannot be empty.")
    missing_metrics = set(scores.keys()) - set(metric_weights.keys())
    if missing_metrics:
        raise ValueError(f"Missing weights for metrics: {', '.join(missing_metrics)}")
    return sum(score * metric_weights.get(metric, 0) for metric, score in scores.items()) / sum(metric_weights.values())
if not isinstance(x_train, (pd.DataFrame, np.ndarray)):
    raise TypeError("'x_train' must be a pandas DataFrame or NumPy ndarray.")
if not isinstance(y_train, (pd.Series, np.ndarray)):
    raise TypeError("'y_train' must be a pandas Series or NumPy ndarray.")
if task_type not in ['classification', 'regression']:
    raise ValueError("'task_type' must be 'classification' or 'regression'.")
# Additional checks for 'priority_metrics', 'cv', and 'n_top_models'...
model_scores = {}
composite_scores = {}
for model_name, model in models.items():
    scores = cross_validate(model, x_train, y_train, cv=cv, scoring=scoring)
    average_scores = {metric: np.mean(scores[f'test_{metric}']) for metric in scoring}
    composite_score = calculate_composite_score(average_scores, metric_weights)
    model_scores[model_name] = average_scores
    composite_scores[model_name] = composite_score
top_models = sorted(composite_scores, key=composite_scores.get, reverse=True)[:n_top_models]
if verbose > 0:
    print("< MODEL RECOMMENDATIONS >")
    if priority_metrics:
        print("Priority metrics used in scoring:")
        for metric in priority_metrics:
            print(f" - {metric}")
    else:
        print("No priority metrics specified.")
    for model_name in top_models:
        print(f"{model_name}: Composite Score = {composite_scores[model_name]:.4f}")
        for metric, score in model_scores[model_name].items():
            print(f"  {metric}: {score:.4f}")

Link to Full Code