ETA444 / datasafari

DataSafari simplifies complex data science tasks into straightforward, powerful one-liners.
https://datasafari.dev
GNU General Public License v3.0
2 stars 0 forks source link

Develop model_tunning_core() for predict_ml() #111

Closed ETA444 closed 5 months ago

ETA444 commented 6 months ago

Title: Implement Model Tuning Core for Automated Machine Learning Pipeline - back-end of predict_ml()

Description: The model_tuning_core() function is designed to conduct hyperparameter tuning on a set of machine learning models using various tuning methods and parameter grids. It systematically explores the hyperparameter space of the given models, supporting customization of the tuning process through multiple parameters, and returns the best-tuned models along with their scores.

Proposed Features:

Expected Outcome: Upon implementation, the model_tuning_core() function will enhance the automated machine learning pipeline by enabling systematic hyperparameter tuning of machine learning models. This enhancement will streamline the model tuning process, improve model performance, and facilitate the selection of optimal models for different tasks.

Additional Context: The proposed model tuning core addresses the need for efficient hyperparameter optimization in machine learning workflows. By incorporating multiple tuning methods and customizable parameters, it provides flexibility and scalability in model tuning, catering to diverse analysis requirements. This enhancement aligns with the objective of advancing automation and productivity in machine learning experimentation and model development.

ETA444 commented 5 months ago

Implementation Summary

model_tuning_core() is a comprehensive function designed to fine-tune machine learning models by systematically exploring hyperparameter spaces using grid search, random search, or Bayesian optimization. This function is crucial for optimizing model performance through hyperparameter tuning, adjusting to the specific needs of the task with various customizable parameters.

Code Breakdown

if not isinstance(x_train, (pd.DataFrame, np.ndarray)):
    raise TypeError("model_tuning_core(): 'x_train' must be a pandas DataFrame or NumPy ndarray.")
if not isinstance(y_train, (pd.Series, np.ndarray)):
    raise TypeError("model_tuning_core(): 'y_train' must be a pandas Series or NumPy ndarray.")
# Additional type and configuration validations...
n_iter_random = 10 if n_iter_random is None else n_iter_random
n_iter_bayesian = 50 if n_iter_bayesian is None else n_iter_bayesian
n_jobs = -1 if n_jobs is -1 else n_jobs
for model_name, model in models.items():
    tuner = GridSearchCV(model, param_grid=custom_param_grids.get(model_name, {}), scoring=refit_metric, n_jobs=n_jobs, cv=cv) if 'grid' in priority_tuners else None
    if 'random' in priority_tuners:
        tuner = RandomizedSearchCV(model, param_distributions=custom_param_grids.get(model_name, {}), n_iter=n_iter_random, scoring=refit_metric, n_jobs=n_jobs, cv=cv, random_state=random_state)
    if 'bayesian' in priority_tuners:
        tuner = BayesSearchCV(model, search_spaces=custom_param_grids.get(model_name, {}), n_iter=n_iter_bayesian, scoring=refit_metric, n_jobs=n_jobs, cv=cv, random_state=random_state)
    tuner.fit(x_train, y_train)
    best_model = tuner.best_estimator_
    best_score = tuner.best_score_
tuned_models = {model_name: {'best_model': best_model, 'best_score': best_score} for model_name, best_model, best_score in results}
if verbose > 0:
    print("Tuning process initiated...")
    print(f"Using {', '.join(priority_tuners)} tuners with priority metrics {', '.join(priority_metrics)}")
    for model_name, model_info in tuned_models.items():
        print(f"Best model for {model_name}: Score - {model_info['best_score']}")

Link to Full Code