CirSandro / PFEE-ZAMA

A capstone project in collaboration with Zama to develop a privacy-preserving machine learning model using PPML, FHE and Concrete ML to detect banking frauds.
1 stars 1 forks source link
data-science fhe homomorphic-encryption machine-learning ppml privacy python sckiit-learn

PFEE-ZAMA

Table of Contents

Introduction

As the use of sensitive personal data in machine learning models becomes increasingly prevalent, ensuring data privacy and security is paramount. PFEE-ZAMA addresses these concerns by leveraging advanced techniques such as Privacy-Preserving Machine Learning (PPML) and Fully Homomorphic Encryption (FHE). This project enables secure predictions without compromising the confidentiality of user data.

About This Project

Project Overview

PFEE-ZAMA is a capstone project (Projet de Fin d'Études Encadré) developed in collaboration with Zama, a leader in homomorphic encryption. The primary objective is to explore and implement Privacy-Preserving Machine Learning (PPML) and Fully Homomorphic Encryption (FHE) using Concrete ML, an open-source library developed by Zama. The project focuses on creating a machine learning model capable of making secure predictions while ensuring data privacy.

Project Goals

Key Technologies

Project Contributions

  1. Understanding and Implementing PPML and FHE: Conducted an in-depth study of PPML and FHE principles and their practical implementation.
  2. Model Development: Created and trained a machine learning model using Concrete ML to perform secure predictions.
  3. Data Privacy: Ensured that all data used in the predictions remained encrypted, maintaining user confidentiality and data integrity.
  4. Client-Server Architecture: Developed a scalable client-server system to handle encrypted prediction requests.
  5. Documentation: Provided comprehensive documentation using Sphinx for ease of understanding and deployment.

Deployment

Prerequisites

Before deploying PFEE-ZAMA, ensure that you have the following installed on your system:

Setup

  1. Clone the Repository

    git clone https://github.com/yourusername/pfee-zama.git
    cd pfee-zama
  2. Install Dependencies

    Use the provided Makefile to install all necessary dependencies.

    make setup

    This command will install all packages listed in requirements.txt using pip.

Training the Model

Before running the server and client, train the machine learning model using the following command:

make train

This will execute the models/FHEModel.py script, which:

Running the Server and Client

PFEE-ZAMA follows a client-server architecture where the server handles encrypted prediction requests, and the client manages user interactions and data encryption.

  1. Run the Server

    make run_server

    This command starts the FastAPI server on http://0.0.0.0:8000, ready to receive encrypted prediction requests.

  2. Run the Client

    In a new terminal window, execute:

    make run_client

    This starts the FastAPI client on http://127.0.0.1:8001. The client will send encrypted evaluation keys to the server upon startup.

Workflow

Prediction Process

When a user makes a prediction via the API client, the following workflow occurs:

  1. User Input: The user provides input data through the client’s API endpoint (/predict).

  2. Data Preprocessing: The client application scales the input data using a pre-trained scaler (scaler.pkl).

  3. Data Encryption: The scaled data is encrypted using Concrete ML's FHE capabilities. The encryption process ensures that the data remains confidential during computation.

  4. Sending Encrypted Data: The encrypted data is sent to the server’s /predict endpoint via a POST request.

  5. Server Processing: The server receives the encrypted data and uses the encrypted model to perform the prediction directly on the encrypted input without decrypting it.

  6. Encrypted Prediction: The server returns the encrypted prediction result to the client.

  7. Decryption: The client decrypts the received prediction to obtain the final result.

  8. Result Delivery: The client returns the decrypted prediction to the user.

This workflow ensures that at no point is the sensitive user data exposed in an unencrypted form, maintaining data privacy and security throughout the process.

Usage

Using the Makefile

The Makefile simplifies common tasks such as setting up the environment, training the model, running the server and client, and testing. Below are the available commands:

Making Predictions via the API Client

To make predictions using the client API, follow these steps:

  1. Ensure Server and Client are Running

    Make sure both the server and client are up and running using the Makefile commands mentioned above.

  2. Send a Prediction Request

    You can use tools like curl, HTTPie, or Postman to send a POST request to the client’s /predict endpoint. Below is an example using curl:

    curl -X POST "http://127.0.0.1:8001/predict" \
    -H "Content-Type: application/json" \
    -d '{
         "distance_from_home": 10.5,
         "distance_from_last_transaction": 5.2,
         "ratio_to_median_purchase_price": 1.3,
         "repeat_retailer": 1,
         "used_chip": 0,
         "used_pin_number": 1,
         "online_order": 0
       }'
  3. Receive the Prediction

    The client will process the input, encrypt it, send it to the server, and return the decrypted prediction. The response will look like:

    {
     "prediction": 0
    }

    Here, 0 could represent a non-fraudulent transaction, and 1 could represent a fraudulent one, depending on your model's encoding.

Continuous Integration

PFEE-ZAMA utilizes GitHub Actions for continuous integration to ensure code quality and maintainability.

Configuration

The CI workflow is defined in .github/workflows/ci.yml. It includes steps for:

Validating Commit Messages

Commit messages must adhere to the following format to maintain consistency and clarity:

A commit message validation hook is included and executed during the CI process to enforce these conventions.

Installation of Dependencies

Dependencies are managed via requirements.txt. To install them manually, use:

pip install -r requirements.txt

Dataset

The project uses the Credit Card Fraud Detection dataset from Kaggle. The dataset contains transactions made by credit cards in September 2013 by European cardholders. It presents transactions that occurred in two days, where we have 492 frauds out of 284,807 transactions.

This setup ensures that you can deploy the PFEE-ZAMA system efficiently and start making secure predictions with ease.


For more detailed documentation, refer to the PFEE-ZAMA Documentation.