DeepBlockDeepak / kaggle_titanic

Titanic Survivor Predictor: A multiple machine learning model project to forecast survival outcomes of Titanic passengers. Engineered from historical data, refined with feature selection, tested with CI/CD, and outputs validation metrics.
https://www.kaggle.com/c/titanic
1 stars 0 forks source link

Refactor Project Structure for Model Modularity #5

Closed DeepBlockDeepak closed 7 months ago

DeepBlockDeepak commented 8 months ago

Overview

Restructure the project to better modularize different machine learning models (Random Forest, SVM, Decision Tree) within the Titanic Survival Prediction project. Need to maintain a clear, organized, and scalable codebase that allows for easy navigation, addition of new models, and flexibility in execution. This restructuring will be a significant step towards maintaining a professional and scalable project.

Current Structure

The current structure has a single entry point (main.py) with all model-related code within the src/ directory. This setup, while functional, can become cluttered as I add more models or functionalities.

Proposed Structure

The proposed structure introduces subdirectories within src/ for each model and uses command-line arguments in main.py to run different models. This enhances clarity and maintains a single entry point for the project.

New Project Tree

kaggle_titanic/ │ ├── main.py # Entry point, handles command-line arguments ├── README.md ├── poetry.lock ├── pyproject.toml │ ├── data/ # Data files │ ├── models/ # Saved model files │ ├── outputs/ # Output visualizations, results │ ├── src/ # Source code │ ├── init.py │ ├── preprocess.py # Common preprocessing code │ ├── evaluate_model.py # Model evaluation code │ ├── features.py # Feature engineering │ │ │ ├── random_forest/ # RandomForest-specific code │ │ ├── init.py │ │ └── train.py │ │ │ ├── decision_tree/ # DecisionTree-specific code │ │ ├── init.py │ │ ├── decision_tree.py │ │ └── train.py │ │ │ └── svm/ # SVM-specific code │ ├── init.py │ └── train.py ├── submission.csv ├── tests/ └── user_passenger.py

Main.py Modification

main.py will use Python's argparse library to handle command-line arguments, allowing users to specify which model to run.

Benefits

Tasks

DeepBlockDeepak commented 8 months ago

New main.py should resemble:

import argparse
from src.random_forest.train import run_random_forest
from src.decision_tree.train import run_decision_tree
from src.svm.train import run_svm

def main():
    parser = argparse.ArgumentParser(description="Titanic Survival Prediction Project")
    parser.add_argument('--model', choices=['random_forest', 'decision_tree', 'svm'], required=True)

    args = parser.parse_args()

    if args.model == 'random_forest':
        run_random_forest()
    elif args.model == 'decision_tree':
        run_decision_tree()
    elif args.model == 'svm':
        run_svm()

if __name__ == "__main__":
    main()