ETA444 / datasafari

DataSafari simplifies complex data science tasks into straightforward, powerful one-liners.
https://datasafari.dev
GNU General Public License v3.0
2 stars 0 forks source link

Develop model_recommendation_core_inference() for predict_ml() #115

Closed ETA444 closed 4 months ago

ETA444 commented 4 months ago

Title: Develop Model Recommendation Core for Statistical Inference (second back-end pipeline in predict_ml()

Description: The proposed development involves creating an enhanced version of the model_recommendation_core_inference() function. This function aims to recommend top statistical models for inference based on user-specified preferences and formula. It evaluates various statistical models from statsmodels, dynamically determining their suitability for either regression or classification tasks based on the nature of the target variable.

Proposed Changes:

Expected Outcome: Upon completion of development, the enhanced model_recommendation_core_inference() function will serve as a powerful tool for recommending top statistical models for inference. By dynamically evaluating models and providing detailed insights into model performance, this enhancement will streamline the model selection process, improve user understanding, and facilitate informed decision-making in statistical inference tasks.

Additional Context: The proposed development aligns with the increasing demand for automated and efficient model selection in statistical inference tasks. By incorporating dynamic evaluation, customizable model parameters, and robust error handling, the enhanced function aims to address the diverse needs of users in statistical modeling and analysis. This development initiative underscores our commitment to advancing automation and productivity in statistical inference workflows.

ETA444 commented 4 months ago

Implementation Summary

model_recommendation_core_inference() is designed to recommend the best statistical models for inference tasks based on specific criteria provided by the user. It evaluates models from the statsmodels library, applicable for regression or classification based on the target variable's nature. The function leverages a patsy formula for model specification, allowing for dynamic selection of models suited to the data's characteristics.

Code Breakdown

if formula.count('~') != 1:
    raise ValueError("The formula must contain exactly one '~'.")
if y_col not in df.columns:
    raise ValueError(f"The target variable '{y_col}' specified in the formula is not found in the DataFrame.")
models_to_evaluate = {name: model for name, model in all_models.items() if name in priority_models} if priority_models else all_models
for model_name, model_func in models_to_evaluate.items():
    model = model_func(formula, df, **model_kwargs.get(model_name, {})).fit()
    model_results[model_name] = {'model': model, 'metrics': evaluate_model(model)}
for name, details in model_results.items():
    adjusted_metrics = {metric: (-value if metric in ['AIC', 'BIC'] else value) for metric, value in details['metrics'].items()}
    sorted_models.append((name, adjusted_metrics))
top_models = sorted(sorted_models, key=lambda x: list(x[1].values()), reverse=True)[:n_top_models]
for model_name, metrics in top_models:
    print(f"Model: {model_name}, Metrics: {metrics}")
    if verbose > 1:
        print(model_results[model_name]['model'].summary())
if verbose > 0:
    for name, details in top_models.items():
        print(f"Top Model: {name}")
        for metric, value in details['metrics'].items():
            print(f"{metric}: {value:.2f}")
        if verbose > 1:
            print(details['model'].summary())

Link to Full Code