fhswf / MLPro

MLPro - The Integrative Middleware Framework for Standardized Machine Learning in Python
https://mlpro.readthedocs.io/
Apache License 2.0
12 stars 3 forks source link

OA: Benchmark stream provider for all types of anomalies #718

Closed detlefarend closed 2 months ago

detlefarend commented 1 year ago

Description/Motivation A new native benchmark stream provider for all types of anomalies shall be implemented.

Task list

Related issues

911, #795

Cross references https://www.sciencedirect.com/science/article/abs/pii/S2452414X21000145 https://ieee-dataport.org/documents/tennessee-eastman-simulation-dataset

detlefarend commented 1 year ago

@syam47 to be discussed. Could you please add this topic to the agenda of our next call? Maybe you could research for existing benchmark streams in the literature (e.g. Tennessee eastman process TEP and others)...

syamrajsatheesh commented 1 year ago

Yes, I will do that.

syamrajsatheesh commented 1 year ago

Datasets which can be used for AD testing and benchmarking

  1. Yahoo Labs Webscope - the Yahoo Network Traffic dataset https://webscope.sandbox.yahoo.com/catalog.php?datatype=i&did=67&guccounter=1

  2. Numenta Anomaly Benchmark https://github.com/numenta/NAB

  3. KDD Cup 1999 dataset Network intrusion detection dataset, which was created by the National Institute of Standards and Technology (NIST) in 1999 as part of the KDD Cup competition. The dataset consists of a set of network traffic data and was created by simulating a typical US Air Force LAN. The dataset contains several types of attacks, including DoS (Denial of Service), Probe, R2L (Unauthorized access from a remote machine), and U2R (Unauthorized access to local superuser privileges). The dataset also includes normal traffic data. https://archive.ics.uci.edu/ml/datasets/kdd+cup+1999+data

  4. UCI ADULT dataset – census yearly income https://archive.ics.uci.edu/ml/datasets/adult

  5. UCI Arrhythmia dataset - the presence and absence of cardiac arrhythmia and to classify it in one of the 16 groups https://archive.ics.uci.edu/ml/datasets/arrhythmia

  6. KEEL- Thyroid - given patient is normal or suffers from hyperthyroidism or hypothyroidism https://sci2s.ugr.es/keel/dataset.php?cod=67

syamrajsatheesh commented 1 year ago

Dataset for anomaly/outlier detection

  1. KDD Cup 1999 https://www.openml.org/search?type=data&status=active&id=1110&sort=runs https://www.openml.org/search?type=data&status=active&id=1113&sort=runs

  2. Credit card fraud detection https://www.openml.org/search?type=data&status=active&sort=runs&id=42397