Closed qihuagao closed 2 years ago
@qihuagao, can you please provide some code so we can reproduce the results you got? I ran myself the notebook, but still got an F1 score of 0.9754.
Thanks!
@qihuagao, can you please provide some code so we can reproduce the results you got? I ran myself the notebook, but still got an F1 score of 0.9754.
Thanks!
Thank you very much for the response, I just followed the link, the code is like the below.
import os
import logging
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
from sklearn.metrics import confusion_matrix, f1_score
import tensorflow as tf
tf.keras.backend.clear_session()
from tensorflow.keras.layers import Dense, InputLayer
from alibi_detect.datasets import fetch_kdd
from alibi_detect.models.tensorflow.losses import elbo
from alibi_detect.od import OutlierVAE
from alibi_detect.utils.data import create_outlier_batch
from alibi_detect.utils.fetching import fetch_detector
from alibi_detect.utils.saving import save_detector, load_detector
from alibi_detect.utils.visualize import plot_instance_score, plot_feature_outlier_tabular, plot_roc
logger = tf.get_logger()
logger.setLevel(logging.ERROR)
kddcup = fetch_kdd(percent10=True) # only load 10% of the dataset
print(kddcup.data.shape, kddcup.target.shape)
np.random.seed(0)
normal_batch = create_outlier_batch(kddcup.data, kddcup.target, n_samples=400000, perc_outlier=0)
X_train, y_train = normal_batch.data.astype('float'), normal_batch.target
print(X_train.shape, y_train.shape)
print('{}% outliers'.format(100 * y_train.mean()))
mean, stdev = X_train.mean(axis=0), X_train.std(axis=0)
X_train = (X_train - mean) / stdev
load_outlier_detector = False
filepath = 'my_dir' # change to directory (absolute path) where model is downloaded
detector_type = 'outlier'
dataset = 'kddcup'
detector_name = 'OutlierVAE'
filepath = os.path.join(filepath, detector_name)
if load_outlier_detector: # load pretrained outlier detector
od = fetch_detector(filepath, detector_type, dataset, detector_name)
else: # define model, initialize, train and save outlier detector
n_features = X_train.shape[1]
latent_dim = 2
encoder_net = tf.keras.Sequential(
[
InputLayer(input_shape=(n_features,)),
Dense(20, activation=tf.nn.relu),
Dense(15, activation=tf.nn.relu),
Dense(7, activation=tf.nn.relu)
])
decoder_net = tf.keras.Sequential(
[
InputLayer(input_shape=(latent_dim,)),
Dense(7, activation=tf.nn.relu),
Dense(15, activation=tf.nn.relu),
Dense(20, activation=tf.nn.relu),
Dense(n_features, activation=None)
])
# initialize outlier detector
od = OutlierVAE(threshold=None, # threshold for outlier score
score_type='mse', # use MSE of reconstruction error for outlier detection
encoder_net=encoder_net, # can also pass VAE model instead
decoder_net=decoder_net, # of separate encoder and decoder
latent_dim=latent_dim,
samples=5)
# train
od.fit(X_train,
loss_fn=elbo,
cov_elbo=dict(sim=.01),
epochs=30,
verbose=True)
# save the trained outlier detector
save_detector(od, filepath)
np.random.seed(0)
perc_outlier = 5
threshold_batch = create_outlier_batch(kddcup.data, kddcup.target, n_samples=1000, perc_outlier=perc_outlier)
X_threshold, y_threshold = threshold_batch.data.astype('float'), threshold_batch.target
X_threshold = (X_threshold - mean) / stdev
print('{}% outliers'.format(100 * y_threshold.mean()))
od.infer_threshold(X_threshold, threshold_perc=100-perc_outlier)
print('New threshold: {}'.format(od.threshold))
save_detector(od, filepath)
np.random.seed(1)
outlier_batch = create_outlier_batch(kddcup.data, kddcup.target, n_samples=1000, perc_outlier=10)
X_outlier, y_outlier = outlier_batch.data.astype('float'), outlier_batch.target
X_outlier = (X_outlier - mean) / stdev
print(X_outlier.shape, y_outlier.shape)
print('{}% outliers'.format(100 * y_outlier.mean()))
od_preds = od.predict(X_outlier,
outlier_type='instance', # use 'feature' or 'instance' level
return_feature_score=True, # scores used to determine outliers
return_instance_score=True)
print(list(od_preds['data'].keys()))
labels = outlier_batch.target_names
y_pred = od_preds['data']['is_outlier']
f1 = f1_score(y_outlier, y_pred)
print('F1 score: {:.4f}'.format(f1))
cm = confusion_matrix(y_outlier, y_pred)
df_cm = pd.DataFrame(cm, index=labels, columns=labels)
sns.heatmap(df_cm, annot=True, cbar=True, linewidths=.5)
plt.show()
@qihuagao can you post what your environment is? I.e. Python version and versions of libraries (e.g. via pip freeze
if using pip
).
@qihuagao, thanks for sharing the code. We managed to replicate the issue. Seems like the VAE is not training properly (can not tell for sure) as the lower score shows up only when training the model from scratch. We will look more into it and come back with an answer as soon as possible.
pip freeze
` absl-py==0.14.1 alibi-detect==0.7.2 astunparse==1.6.3 cached-property==1.5.2 cachetools==4.2.4 certifi==2021.5.30 charset-normalizer==2.0.6 clang==5.0 click==8.0.3 cloudpickle==2.0.0 cycler==0.10.0 dataclasses==0.8 decorator==5.1.0 dill==0.3.4 dm-tree==0.1.6 filelock==3.3.1 flatbuffers==1.12 gast==0.4.0 google-auth==1.35.0 google-auth-oauthlib==0.4.6 google-pasta==0.2.0 grpcio==1.34.1 h5py==3.1.0 huggingface-hub==0.0.19 idna==3.2 imageio==2.9.0 importlib-metadata==4.8.1 joblib==1.0.1 keras==2.6.0 keras-nightly==2.5.0.dev2021032900 Keras-Preprocessing==1.1.2 kiwisolver==1.3.1 Markdown==3.3.4 matplotlib==3.3.4 networkx==2.5.1 numpy==1.19.5 oauthlib==3.1.1 opencv-python==4.5.4.58 opt-einsum==3.3.0 packaging==21.0 pandas==1.1.5 Pillow==8.4.0 protobuf==3.18.1 pyasn1==0.4.8 pyasn1-modules==0.2.8 pyparsing==3.0.0 python-dateutil==2.8.2 pytz==2021.3 PyWavelets==1.1.1 PyYAML==6.0 regex==2021.10.23 requests==2.26.0 requests-oauthlib==1.3.0 rsa==4.7.2 sacremoses==0.0.46 scikit-image==0.17.2 scikit-learn==0.24.2 scipy==1.5.4 seaborn==0.11.2 six==1.15.0 sklearn==0.0 tensorboard==2.6.0 tensorboard-data-server==0.6.1 tensorboard-plugin-wit==1.8.0 tensorflow==2.5.1 tensorflow-estimator==2.5.0 tensorflow-probability==0.12.2 termcolor==1.1.0 threadpoolctl==3.0.0 tifffile==2020.9.3 tokenizers==0.10.3 tqdm==4.62.3 transformers==4.11.3 typing-extensions==3.7.4.3 urllib3==1.26.7 Werkzeug==2.0.2 wrapt==1.12.1 zipp==3.6.0
`
@qihuagao, we performed the following experiments:
We trained 5 VAEs and used them to detect the outliers. We obtained the following F1 score [0.39, 0.96, 0.41, 0.59, 0.39].
We selected the first VAE (which obtained 0.39) and the second VAE (which obtained 0.96) and plotted the histograms of the iscore
which are depicted in the following figure:
In both cases, we can identify 3 modes for the outlier score distribution. The problematic mode is the one on the left, which lies before the inferred threshold. The existence of the modes it is also noted in the current notebook.
Unfortunately, this is an artifact of both the dataset and the method itself. We will follow this issue with a PR to ensure consistency across multiple executions as the current implementation does not support it.
Please let me know if you need further clarifications and support. Thank you!
Issue to ensure consistency across multiple executions: #407
Issue to ensure consistency across multiple executions: #407
Thank you for sharing the result.
IMO, Ensuring consistency can help to clarify this. But it definitely not help to solve the problem, and only hide the insufficiency of the modal. <0.5 f1 result is almost useless for any application. it would be better if improve the modal stability by optimizing beta, modal complexity, eg. So that this notebook can be base modal as vae startup in this senario.
Thanks again for sharing this great project.. @RobertSamoilescu
@qihuagao, in completion to the previous comment, we did experiment with various hyperparametres, beta & sim, but the results were similar.
Firstly thank you for sharing this great project which help me a lot.
I follow the procedure like the follow link: https://docs.seldon.io/projects/alibi-detect/en/latest/examples/od_vae_kddcup.html but I got very bad result
Is there anything missed in this doc??? I only got F1 score 0.3407, while in the doc, it says 0.9754
Looking forward to any response.
5.0% outliers New threshold: 0.2997805822912754 (1000, 18) (1000,) 10.0% outliers ['instance_score', 'feature_score', 'is_outlier'] F1 score: 0.3407