Develop model_tunning_core() for predict_ml()

Title: Implement Model Tuning Core for Automated Machine Learning Pipeline - back-end of predict_ml()

Description: The model_tuning_core() function is designed to conduct hyperparameter tuning on a set of machine learning models using various tuning methods and parameter grids. It systematically explores the hyperparameter space of the given models, supporting customization of the tuning process through multiple parameters, and returns the best-tuned models along with their scores.

Proposed Features:

Tuning Methods Integration: Implement hyperparameter tuning methods including grid search, random search, and Bayesian optimization to explore the parameter space efficiently.
Customizable Tuning Process: Allow users to specify the tuning methods, priority metrics, and custom parameter grids for each model to tailor the tuning process according to specific requirements.
Refit Capability: Provide the option to refit the models using a specified metric after tuning to ensure optimal performance on the full dataset.
Error Handling and Validation: Implement comprehensive error handling and validation mechanisms to ensure the integrity of input data and parameter settings, providing informative error messages for troubleshooting.
Parallel Processing Support: Enable parallel processing with adjustable number of jobs to leverage multi-core systems for faster computation, enhancing scalability and efficiency.

Expected Outcome: Upon implementation, the model_tuning_core() function will enhance the automated machine learning pipeline by enabling systematic hyperparameter tuning of machine learning models. This enhancement will streamline the model tuning process, improve model performance, and facilitate the selection of optimal models for different tasks.

Additional Context: The proposed model tuning core addresses the need for efficient hyperparameter optimization in machine learning workflows. By incorporating multiple tuning methods and customizable parameters, it provides flexibility and scalability in model tuning, catering to diverse analysis requirements. This enhancement aligns with the objective of advancing automation and productivity in machine learning experimentation and model development.

Implementation Summary

model_tuning_core() is a comprehensive function designed to fine-tune machine learning models by systematically exploring hyperparameter spaces using grid search, random search, or Bayesian optimization. This function is crucial for optimizing model performance through hyperparameter tuning, adjusting to the specific needs of the task with various customizable parameters.

Code Breakdown

Error Handling
- Validates input types and configurations to prevent common sources of error during model tuning. This includes checks for data type conformity, ensuring correct and meaningful parameter entries.

if not isinstance(x_train, (pd.DataFrame, np.ndarray)):
    raise TypeError("model_tuning_core(): 'x_train' must be a pandas DataFrame or NumPy ndarray.")
if not isinstance(y_train, (pd.Series, np.ndarray)):
    raise TypeError("model_tuning_core(): 'y_train' must be a pandas Series or NumPy ndarray.")
# Additional type and configuration validations...

Parameter Setup
- Sets up the tuning parameters such as the number of iterations for random and Bayesian searches, handling of job parallelism, and cross-validation strategy.

n_iter_random = 10 if n_iter_random is None else n_iter_random
n_iter_bayesian = 50 if n_iter_bayesian is None else n_iter_bayesian
n_jobs = -1 if n_jobs is -1 else n_jobs

Model Evaluation and Tuning
- Applies the selected tuning methods to the provided models using either provided or default parameter grids. Adjusts iterations based on already tested parameter combinations to enhance efficiency.

for model_name, model in models.items():
    tuner = GridSearchCV(model, param_grid=custom_param_grids.get(model_name, {}), scoring=refit_metric, n_jobs=n_jobs, cv=cv) if 'grid' in priority_tuners else None
    if 'random' in priority_tuners:
        tuner = RandomizedSearchCV(model, param_distributions=custom_param_grids.get(model_name, {}), n_iter=n_iter_random, scoring=refit_metric, n_jobs=n_jobs, cv=cv, random_state=random_state)
    if 'bayesian' in priority_tuners:
        tuner = BayesSearchCV(model, search_spaces=custom_param_grids.get(model_name, {}), n_iter=n_iter_bayesian, scoring=refit_metric, n_jobs=n_jobs, cv=cv, random_state=random_state)
    tuner.fit(x_train, y_train)
    best_model = tuner.best_estimator_
    best_score = tuner.best_score_

Selection and Output of Best Models
- The function outputs the best-tuned models along with their scores, selecting the top performers based on the composite score derived from the specified refit metric.

tuned_models = {model_name: {'best_model': best_model, 'best_score': best_score} for model_name, best_model, best_score in results}

Verbose Logging
- Provides detailed logs based on the verbosity level. This includes information about the tuning process, priority metrics and tuners used, and summaries of the best models.

if verbose > 0:
    print("Tuning process initiated...")
    print(f"Using {', '.join(priority_tuners)} tuners with priority metrics {', '.join(priority_metrics)}")
    for model_name, model_info in tuned_models.items():
        print(f"Best model for {model_name}: Score - {model_info['best_score']}")

ETA444 / datasafari

Develop model_tunning_core() for predict_ml() #111

Implementation Summary

Code Breakdown

Link to Full Code