YotamAflalo / MLOps-TeleChurnPredictor

This repo was created for an intermediate project in the MLOPS course of Naya College. The goal of the project is to establish an abandonment prediction mechanism, with the possibility of API and batch prediction. The project has a CI-CD procedure, monitoring, a grafana dashboard, and more.
MIT License
2 stars 1 forks source link
beam fastapi grafana mlops mlops-workflow postgresql prometheus

MLOps Mid-Project: ML Model Deployment for Customer Churn Prediction

This project implements a machine learning model for customer churn prediction, utilizing FastAPI and Apache Beam. It includes a comprehensive MLOps pipeline with monitoring, batch processing, and CI/CD integration.

The presenation of this project can be found here: Gamma Link

Features

Architecture

  1. Data Collection: Sources include CSV files in data/raw or database queries (fetching only unpredicted data based on the timestamp of the last prediction)
  2. Data Preprocessing: Data cleaning and transformation for model input (implemented in beam_preprocessing.py)
  3. Model Execution: Running the RandomForestClassifier model on preprocessed data
  4. Output: Processed data and predictions saved in the data/batch_results folder and/or database

System Architecture Diagram

graph TD
    A[<b>API</b>] -->|Saves predictions| B[(<b>PostgreSQL DB</b>)]
    A -->|Monitored by| C[<b>Prometheus</b>]
    C -->|Visualized in| D[<b>Grafana</b>]
    A -.->|Logs stored in| B
    B -->|Daily API logs| E[<b>WhyLabs</b>]
    F[<b>Batch Processing</b>] <-->|Reads/Writes| B
    F <-->|Processes files| G[<b>Target Folder</b>]
    F -->|Daily logs & predictions| E
    H[<b>User</b>] -->|API requests| A
    I[<b>Cron Job</b>] -->|Triggers daily| F
    J[<b>MLOps Engineer</b>] -->|Views| D
    D -->|Alerts| J

    classDef primary fill:#e6f3ff,stroke:#333,stroke-width:2px;
    classDef secondary fill:#d0e0e3,stroke:#333,stroke-width:2px;
    classDef tertiary fill:#fff2cc,stroke:#333,stroke-width:2px;
    classDef quaternary fill:#f2e6ff,stroke:#333,stroke-width:2px;
    classDef default color:#000000;

    class A,F primary;
    class B,G secondary;
    class C,D,E tertiary;
    class H,I,J quaternary;

Docker Compose Service Architecture

graph TD
    A[api] -->|depends on| B[db]
    A -->|connects to| C[prometheus]
    D[batch] -->|depends on| B
    C -->|visualized by| E[grafana]
    F[network]

    A -->|part of| F
    B -->|part of| F
    C -->|part of| F
    D -->|part of| F
    E -->|part of| F

    subgraph Volumes
        G[postgres_data]
        H[grafana_data]
    end

    B -->|uses| G
    E -->|uses| H

    classDef default fill:#f9f9f9,stroke:#333,stroke-width:2px;
    classDef service fill:#AED6F1,stroke:#3498DB,stroke-width:2px;
    classDef db fill:#F9E79F,stroke:#F4D03F,stroke-width:2px;
    classDef monitoring fill:#D5F5E3,stroke:#2ECC71,stroke-width:2px;
    classDef network fill:#FADBD8,stroke:#E74C3C,stroke-width:2px;
    classDef volume fill:#8E44AD,stroke:#4A235A,stroke-width:2px,color:#FFFFFF;

    class A,D service;
    class B db;
    class C,E monitoring;
    class F network;
    class G,H volume;

API CI/CD Pipeline

graph TD
    A[Push to main branch<br>or Pull Request] --> B[Check out code]
    B --> C[Set up Python 3.10.12]
    C --> D[Install dependencies]
    D --> E[Run tests]
    E --> F[Build Docker image]
    F --> G{Is it a push<br>to main?}
    G -->|Yes| H[Log in to Docker Hub]
    H --> I[Push image to Docker Hub]
    G -->|No| J[End]
    I --> J

    classDef default fill:#f9f9f9,stroke:#333,stroke-width:2px
    classDef trigger fill:#ff9999,stroke:#333,stroke-width:2px
    classDef step fill:#99ccff,stroke:#333,stroke-width:2px
    classDef decision fill:#ffcc99,stroke:#333,stroke-width:2px
    classDef endClass fill:#ccff99,stroke:#333,stroke-width:2px

    class A trigger
    class B,C,D,E,F,H,I step
    class G decision
    class J endClass

Prerequisites

Getting Started

Running the Full Service

  1. create '.env' file int the docker directory, based on the example.env.txt file

  2. Navigate to the docker directory:

    cd docker
  3. Build the Docker images:

    docker-compose build
  4. Start the services:

    docker-compose up
  5. Access the API documentation at http://localhost:8005/docs

Testing the API

Use the /predict/ POST endpoint with the following example body:

{
  "TotalCharges": "1889.5",
  "Contract": "One year",
  "PhoneService": "Yes",
  "tenure": 34
}

Expected response:

{
  "prediction": 0
}

(Indicates the client is not likely to churn soon)

Batch Processing

The batch processing pipeline utilizes Apache Beam for efficient data processing. It runs daily at 12 PM, performing the following steps:

  1. Data retrieval from database or CSV files
  2. Data preprocessing
  3. Model execution using the pickled RandomForestClassifier
  4. Saving results back to the database or file system

Configure the batch job settings in the 'config' file.

Real-time API

The FastAPI application provides real-time predictions for the marketing server. It uses the same preprocessing steps and model as the batch process to ensure consistency.

Monitoring and Alerting

Prometheus

Prometheus is used to collect metrics from both the API and the batch processing pipeline. Key metrics include:

Grafana

  1. Access Grafana at http://localhost:3000/
  2. Navigate to "Dashboards"
  3. Explore the pre-configured dashboards for:
    • API performance
    • Batch processing metrics
    • Model performance over time
    • Data drift indicators

Grafana Dashboard

Alerts are configured in Grafana to notify of any anomalies or issues in the system.

Database

PostgreSQL is used for storing predictions and raw data.

Data Drift Monitoring

Whylogs is implemented for data drift detection. Monitor metrics through the Grafana dashboard or custom reports generated in the Whylabs website.

CI/CD Pipeline

The project includes a full CI/CD pipeline configured with GitHub Actions. View the workflow files in the .github/workflows/ directory.

Configuration

To modify input parameters or other configurations, please refer to the configuration files in the config/ directory.

License

This project is licensed under the MIT License - see the LICENSE.md file for details.

For more information or support, please open an issue in the GitHub repository.