why only attack data during training

danieltan07 / dagmm

My attempt at reproducing the paper Deep Autoencoding Gaussian Mixture Model for Unsupervised Anomaly Detection

400 stars 106 forks source link

why only attack data during training #20

Open kristinww opened 4 years ago

kristinww commented 4 years ago

self.train = attack_data[randIdx[:N_train]] self.train_labels = attack_labels[randIdx[:N_train]]

trained data is generated in the data_loader.py line 38 and 39. But why only attacked data used but not normal data.

r07921078 commented 4 years ago

The DAGMM paper mentioned that "normal" samples are in a minority group (about 20%), therefore, "normal" ones are treated as anomalies in this task.

chiachen-chang commented 1 year ago

it's suitable to use the KDD dataset for unsupervised learning tasks. The training set includes only positive instances, while the testing set consists of both positive and negative ones. It's important to note that all the positive instances in the test set should be included in the training set, making this set-up appropriate for unsupervised learning.