SaintAngeLs / CS-MINI-2024Z-AutoML_project_1

Analyze the tunability of machine learning models with Grid Search, Random Search, and Bayesian Optimization. This project explores hyperparameter tuning methods on diverse datasets, comparing efficiency, stability, and performance. Featuring Random Forest, XGBoost, Elastic Net, and Gradient Boosting.
0 stars 0 forks source link
automl automl-algorithms automl-experiments automl-pipeline

Machine Learning Hyperparameter Tuning Analysis

This repository contains the code and documentation for a research project focused on analyzing the tunability of machine learning algorithms. The study investigates various hyperparameter tuning methods applied to different models on multiple datasets, assessing each method's effectiveness, stability, and computational efficiency.

Project Overview

This project explores the tunability of selected machine learning models using three hyperparameter optimization methods:

  1. Grid Search
  2. Random Search
  3. Bayesian Optimization

The primary objectives are:

Datasets and Models

Datasets

The study utilizes ten datasets:

Each dataset is preprocessed through standardization to ensure uniform scaling of features, enabling consistent performance across different models.

Models

The following machine learning models were chosen for this study:

Hyperparameter Tuning Methods

  1. Grid Search: Exhaustively searches through the specified hyperparameter grid, which ensures the best combination is found but is computationally expensive.
  2. Random Search: Randomly samples a subset of hyperparameter combinations, allowing a quicker search at the potential expense of finding the absolute optimum.
  3. Bayesian Optimization: Uses probabilistic models to predict promising areas of the hyperparameter space, improving efficiency over random search while maintaining high accuracy.

Each method employs 3-fold cross-validation to enhance result stability and minimize overfitting risk.

Experimental Procedure

The experiment was conducted in Python, utilizing libraries like scikit-learn and scikit-optimize. For each model-dataset combination, the following steps were executed:

  1. Data Loading and Preprocessing: Each dataset is loaded and standardized.
  2. Task Type Detection: The experiment checks if the dataset is suitable for the selected model (e.g., classification vs. regression).
  3. Hyperparameter Optimization: Each tuning method is applied with 15 iterations:
    • For each method, the model's performance (e.g., accuracy or AUC), best hyperparameters, processing time, and memory usage are recorded.
  4. Result Logging: Results from each iteration are saved in detailed_tuning_results.csv. The best scores and parameters for each method are summarized in best_tuning_results.csv.
  5. Plot Generation: Graphs depicting optimization performance over iterations are generated and saved in the assets/ directory.

Results and Analysis

The results of the experiments are documented and visualized through multiple figures, saved in the assets/ directory.

Key Figures

Example Figures

Each figure is stored in the assets/ directory and can be referenced as follows:

Summary of Findings

Code Structure

Dependencies

How to Run

  1. Clone the repository:

    git clone https://github.com/username/repo_name.git
    cd repo_name
  2. Install dependencies:

    pip install -r requirements.txt
  3. Run the main experiment:

    python main.py

Results Directory

The results are stored in the results/ directory, including:

License

This project is licensed under the MIT License.

References

The concept of tunability in machine learning was adapted from:

Contact

For any inquiries, please contact @SaintAngeLs.