A capstone project in collaboration with Zama to develop a privacy-preserving machine learning model using PPML, FHE and Concrete ML to detect banking frauds.
In this notebook, I conducted a thorough analysis of the credit card fraud detection dataset. The workflow includes:
Data Loading and Preprocessing: Detailed analysis of the dataset's structure, handling missing values, scaling features, and addressing class imbalances using techniques like SMOTE.
Exploratory Data Analysis (EDA): Visualization of key features, identifying trends, and understanding the relationship between features and the target variable, such as fraud.
Model Training and Evaluation: We trained several machine learning models, including both traditional and Fully Homomorphic Encryption (FHE)-based models (e.g., Random Forest, Logistic Regression, Decision Tree, etc.). Each model’s performance was evaluated using accuracy, AUC-ROC curves, and comparison of FHE models against traditional methods to understand the trade-offs in security and performance.
Added a .gitignore file to exclude non-relevant files and directories such as IDE configurations (VSCode, Jupyter), temporary Python files, and environment-specific files (caches, logs, models, checkpoints, etc.).
Added the card_credit.ipynb notebook
In this notebook, I conducted a thorough analysis of the credit card fraud detection dataset. The workflow includes:
Data Loading and Preprocessing: Detailed analysis of the dataset's structure, handling missing values, scaling features, and addressing class imbalances using techniques like SMOTE. Exploratory Data Analysis (EDA): Visualization of key features, identifying trends, and understanding the relationship between features and the target variable, such as fraud. Model Training and Evaluation: We trained several machine learning models, including both traditional and Fully Homomorphic Encryption (FHE)-based models (e.g., Random Forest, Logistic Regression, Decision Tree, etc.). Each model’s performance was evaluated using accuracy, AUC-ROC curves, and comparison of FHE models against traditional methods to understand the trade-offs in security and performance.
Added a .gitignore file to exclude non-relevant files and directories such as IDE configurations (VSCode, Jupyter), temporary Python files, and environment-specific files (caches, logs, models, checkpoints, etc.).