Harshit-code-tech commented 3 days ago

Pull Request for ML-Crate 💡

Issue Title: FastTag Fraud Detection

Info about the related issue (Aim of the project) : Implementing a machine learning model to detect fraudulent transactions in the FASTag system, enhancing security and efficiency in electronic toll collection.
Name: Harshit Ghosh
Email ID for further communication: harshitghosh7@gmail.com
GitHub ID: Harshit Ghosh
Idenitfy yourself: Social Summer Of Code Season 3 Contributor

Closes: #679

Describe the add-ons or changes you've made 📃

Implemented a machine learning pipeline for fraud detection in the FASTag system. Added feature engineering, model training, evaluation, and a Streamlit app for real-time predictions.

Type of change ☑️

What sort of change have you made:

[ ] Bug fix (non-breaking change which fixes an issue)
[ ] New feature (non-breaking change which adds functionality)
[ ] Code style update (formatting, local variables)
[ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
[ ] This change requires a documentation update

How Has This Been Tested? ⚙️

The following steps were taken to test the FastTag Fraud Detection model:

Unit Testing:
- Developed and executed unit tests for functions and methods involved in data preprocessing, feature engineering, and model training.
- Verified correct functionality for various inputs and edge cases.
Integration Testing:
- Conducted integration tests to ensure seamless interaction between components (data preprocessing, model training, and evaluation).
- Tested the complete pipeline from data loading to model prediction.
Model Evaluation:
- Evaluated models using metrics such as F1 Score, Accuracy, and ROC AUC.
- Implemented cross-validation to ensure model robustness and prevent overfitting.
Hyperparameter Tuning:
- Utilized Grid Search for hyperparameter tuning to optimize model performance.
- Tested multiple combinations of parameters for each model.
Exploratory Data Analysis (EDA) Validation:
- Reviewed visualizations to confirm EDA findings, ensuring insights into data distribution and feature relationships.
Web Application Testing:
- Integrated the selected model into a Streamlit web app.
- Conducted end-to-end testing to verify real-time fraud prediction functionality.
Documentation Review:
- Updated project documentation to reflect enhancements.
- Ensured that instructions for running the model and understanding results are clear.
Code Review:
- Conducted a self-review of the code for adherence to project guidelines.
- Added comments for complex sections to improve readability and maintainability.

Verification

Local Testing: Verified the functionality of all components locally.
Peer Review: Collaborated on peer reviews to gather feedback and identify potential issues.

Checklist: ☑️

[x] My code follows the guidelines of this project.
[x] I have performed a self-review of my own code.
[x] I have commented my code, particularly wherever it was hard to understand.
[x] I have made corresponding changes to the documentation.
[x] My changes generate no new warnings.
[x] I have added things that prove my fix is effective or that my feature works.
[ ] Any dependent changes have been merged and published in downstream modules.

github-actions[bot] commented 3 days ago

Our team will soon review your PR. Thanks @Harshit-code-tech :)

abhisheks008 commented 2 days ago

Hi @Harshit-code-tech I have seen that you concluded SVM is the best fitted model but as per the accuracy scores it is the XGB, which is having the better accuracy.

Harshit-code-tech commented 2 days ago

@abhisheks008 sir as mentioned in readme... SVM focuses on maximizing the margin between classes, which helps in creating a more defined decision boundary, reducing the risk of misclassification.

While XGBoost has a slightly better ROC AUC Score and comparable F1-Score and Accuracy, SVM’s performance is more balanced and may generalize better in real-world scenarios.

abhisheks008 / ML-Crate

Fastag Fraud Detection System #688