Closed detlefarend closed 2 months ago
@syam47 to be discussed. Could you please add this topic to the agenda of our next call? Maybe you could research for existing benchmark streams in the literature (e.g. Tennessee eastman process TEP and others)...
Yes, I will do that.
Datasets which can be used for AD testing and benchmarking
Yahoo Labs Webscope - the Yahoo Network Traffic dataset https://webscope.sandbox.yahoo.com/catalog.php?datatype=i&did=67&guccounter=1
Numenta Anomaly Benchmark https://github.com/numenta/NAB
KDD Cup 1999 dataset Network intrusion detection dataset, which was created by the National Institute of Standards and Technology (NIST) in 1999 as part of the KDD Cup competition. The dataset consists of a set of network traffic data and was created by simulating a typical US Air Force LAN. The dataset contains several types of attacks, including DoS (Denial of Service), Probe, R2L (Unauthorized access from a remote machine), and U2R (Unauthorized access to local superuser privileges). The dataset also includes normal traffic data. https://archive.ics.uci.edu/ml/datasets/kdd+cup+1999+data
UCI ADULT dataset – census yearly income https://archive.ics.uci.edu/ml/datasets/adult
UCI Arrhythmia dataset - the presence and absence of cardiac arrhythmia and to classify it in one of the 16 groups https://archive.ics.uci.edu/ml/datasets/arrhythmia
KEEL- Thyroid - given patient is normal or suffers from hyperthyroidism or hypothyroidism https://sci2s.ugr.es/keel/dataset.php?cod=67
Dataset for anomaly/outlier detection
Description/Motivation A new native benchmark stream provider for all types of anomalies shall be implemented.
Task list
Related issues
911, #795
Cross references https://www.sciencedirect.com/science/article/abs/pii/S2452414X21000145 https://ieee-dataport.org/documents/tennessee-eastman-simulation-dataset