RubixML / ML

A high-level machine learning and deep learning library for the PHP language.
https://rubixml.com
MIT License
2.04k stars 184 forks source link

Question, which model users for Fraud Prediction #274

Open fhferreira opened 1 year ago

fhferreira commented 1 year ago

I am checking a solution to prevent "fraudster" to create "store/ecommerces" to sell products as a fraud only.

Example: Product: Stove brand Consul Price: 100 Real price at normal shoppings: 500

Product: Washing machine Eletroclux Price: 119 Real price at normal shoppings: 900

I am new in Machine Learning, so I would like a suggestion.

andrewdalpino commented 1 year ago

Alot of times, fraud detection can be framed in the context of anomaly detection which is an unsupervised approach. The problem with a supervised approach is that it is sometimes not practical to accumulate enough labeled samples that represent fraud situations. The prior probability is just too low i.e. people are generally honest. Fortunately, this skew is acknowledged and handled by most Anomaly Detectors by adjusting the contamination hyper-parameter.

https://docs.rubixml.com/2.0/what-is-machine-learning.html#anomaly-detection

If you took this approach, you can start with a simple Anomaly Detector such as Gaussian MLE and if you need more flexibility, Loda and Isolation Forests work pretty well.

If you went with a supervised approach, you can train a classifier to classify "fraud" or "not fraud" but be mindful if you are using a highly imbalanced dataset (mostly not fraud samples). Some classifiers such as Random Forest will compensate for imbalanced datasets, but it's no substitute for actually having more data to represent the fraud case.

https://docs.rubixml.com/2.0/what-is-machine-learning.html#classification

Hope this helps!

fhferreira commented 1 year ago

andrewdalpino

tks man, helped a lot.