[Learning goals, bulleted/numbered list is preferred]
[e.g. learn the concept and the use of train/validation/test dataset using scikit-learn ]
Gain proficiency in Exploratory Data Analysis
Understand data fraud analysis techniques
Learn to identify anomalies in a dataset
Exercise Statement
[Explain and describe what the exercise is]
[e.g. apply simple random-forest model to classify titanic survivability from titanic data ]
Conduct data fraud analysis on a battery swap service dataset. The dataset contains information about battery swaps at various stations across a city. Your objective is to identify potential fraudulent activities, such as revenue losses due to inconsistencies in swap data, and propose solutions for detection and prevention.
Prerequisites
[Prerequisites, in terms of concepts or other exercises in this repo]
Basic understanding of data manipulation with Python and Pandas
K-Means and Isolation forests model.
Data source/summary:
[Provide a succinct summary of what the data is and where it is from]
This project presents an opportunity to apply data fraud analysis techniques to detect and address potential fraudulent activities. Additionally, you'll propose solutions for automating fraud detection and creating instantaneous alerts to mitigate revenue losses.
(Optional) Suggest/Propose Solutions
I have the solution using K-Means and Isolation forests. Will be happy to create a pull request with the solution.
(Optional) Further Links/Credits to Relevant Resources:
[e.g. This exercise and solution's proposal came from a lab session from DL2020]
Learning Goals
[Learning goals, bulleted/numbered list is preferred] [e.g. learn the concept and the use of train/validation/test dataset using scikit-learn ]
Exercise Statement
[Explain and describe what the exercise is] [e.g. apply simple random-forest model to classify titanic survivability from titanic data ] Conduct data fraud analysis on a battery swap service dataset. The dataset contains information about battery swaps at various stations across a city. Your objective is to identify potential fraudulent activities, such as revenue losses due to inconsistencies in swap data, and propose solutions for detection and prevention.
Prerequisites
[Prerequisites, in terms of concepts or other exercises in this repo]
Data source/summary:
[Provide a succinct summary of what the data is and where it is from]
This project presents an opportunity to apply data fraud analysis techniques to detect and address potential fraudulent activities. Additionally, you'll propose solutions for automating fraud detection and creating instantaneous alerts to mitigate revenue losses.
(Optional) Suggest/Propose Solutions
I have the solution using K-Means and Isolation forests. Will be happy to create a pull request with the solution.
(Optional) Further Links/Credits to Relevant Resources:
[e.g. This exercise and solution's proposal came from a lab session from DL2020]