Develop model_recommendation_core() for predict_ml()

Title: Implement Model Recommendation Core for Automated Machine Learning Pipeline - backend of predict_ml()

Description: The model_recommendation_core() function aims to automate the process of recommending top-performing machine learning models based on composite scores derived from multiple evaluation metrics. This enhancement proposal outlines the functionality and design of the model recommendation core, which is an integral component of an automated machine learning pipeline.

Proposed Changes:

Automated Model Selection: Implement a model recommendation core capable of evaluating a range of machine learning models against a set of performance metrics tailored to specific analysis needs. The core will recommend the top N models based on composite scores calculated from weighted averages of metric scores.
Task Type Support: Support both classification and regression tasks, allowing users to specify the type of machine learning task for which models are recommended.
Priority Metrics: Enable users to prioritize certain evaluation metrics by assigning them higher weights in the composite score calculation. This feature provides flexibility in emphasizing metrics deemed most relevant for the analysis.
Tips and Guidance: Optionally provide tips and guidance for interpreting and leveraging different evaluation metrics. These insights enhance user understanding and facilitate informed decision-making in model selection.
Error Handling and Validation: Implement robust error handling and validation mechanisms to ensure the integrity of input data and parameter settings. Raise informative error messages for invalid inputs or unsupported operations to guide users in troubleshooting potential issues.

Expected Outcome: Upon implementation, the model_recommendation_core() function will serve as a valuable tool for automating model selection in machine learning workflows. By evaluating models across multiple metrics and providing recommendations based on composite scores, this enhancement will streamline the model selection process, reduce manual effort, and improve the effectiveness of machine learning experiments.

Additional Context: The proposed model recommendation core addresses the growing demand for automation and efficiency in machine learning model selection. By leveraging composite scores and weighted metrics, it enables a nuanced comparison of models and facilitates the identification of top-performing candidates. This enhancement aligns with our commitment to advancing automation and productivity in data science workflows.

Implementation Summary

model_recommendation_core() is a sophisticated function designed to recommend the most suitable machine learning models based on a composite score. This score synthesizes multiple evaluation metrics, each weighted according to specified priorities. This function is key to selecting top-performing models tailored to specific analysis needs in the predict_ml() pipeline.

Code Breakdown

Composite Score Calculation
- The calculate_composite_score() function computes a composite score from various metrics weighted by their importance. This weighted average considers each metric's relevance to the overall performance assessment.

def calculate_composite_score(scores: dict, metric_weights: dict) -> float:
    if not isinstance(scores, dict) or not isinstance(metric_weights, dict):
        raise TypeError("Both 'scores' and 'metric_weights' must be dictionaries.")
    if not scores or not metric_weights:
        raise ValueError("'scores' and 'metric_weights' cannot be empty.")
    missing_metrics = set(scores.keys()) - set(metric_weights.keys())
    if missing_metrics:
        raise ValueError(f"Missing weights for metrics: {', '.join(missing_metrics)}")
    return sum(score * metric_weights.get(metric, 0) for metric, score in scores.items()) / sum(metric_weights.values())

Error Handling in model_recommendation_core
- Validates input types and formats, ensuring that all provided data is suitable for processing. This step checks the types of x_train, y_train, and other parameters to avoid common errors during execution.

if not isinstance(x_train, (pd.DataFrame, np.ndarray)):
    raise TypeError("'x_train' must be a pandas DataFrame or NumPy ndarray.")
if not isinstance(y_train, (pd.Series, np.ndarray)):
    raise TypeError("'y_train' must be a pandas Series or NumPy ndarray.")
if task_type not in ['classification', 'regression']:
    raise ValueError("'task_type' must be 'classification' or 'regression'.")
# Additional checks for 'priority_metrics', 'cv', and 'n_top_models'...

Model Evaluation
- Models are evaluated using cross-validation. Scores for specified metrics are collected, averaged, and then used to calculate a composite score for each model.

model_scores = {}
composite_scores = {}
for model_name, model in models.items():
    scores = cross_validate(model, x_train, y_train, cv=cv, scoring=scoring)
    average_scores = {metric: np.mean(scores[f'test_{metric}']) for metric in scoring}
    composite_score = calculate_composite_score(average_scores, metric_weights)
    model_scores[model_name] = average_scores
    composite_scores[model_name] = composite_score

Top Model Selection
- The function selects the top N models based on their composite scores. This is aimed at recommending the best models for further analysis or deployment.

top_models = sorted(composite_scores, key=composite_scores.get, reverse=True)[:n_top_models]

Verbose Output
- Depending on the verbosity level set, this part of the function provides detailed feedback about the model selection process, including which metrics were prioritized and tips for using different metrics.

if verbose > 0:
    print("< MODEL RECOMMENDATIONS >")
    if priority_metrics:
        print("Priority metrics used in scoring:")
        for metric in priority_metrics:
            print(f" - {metric}")
    else:
        print("No priority metrics specified.")
    for model_name in top_models:
        print(f"{model_name}: Composite Score = {composite_scores[model_name]:.4f}")
        for metric, score in model_scores[model_name].items():
            print(f"  {metric}: {score:.4f}")

ETA444 / datasafari

Develop model_recommendation_core() for predict_ml() #110

Implementation Summary

Code Breakdown

Link to Full Code