MarijaStojchevska / Injective-Flow-for-Anomaly-Detection

Approximation of high-dimensional image distributions of different objects and textures from the MVTec dataset using injective flows. Goal: Detection of OOD images from manufacturing domain.
0 stars 0 forks source link
Usage: train.py [--path PATH] [--dataset DATASET]
                [--testset_type TESTSET_TYPE] [--categoryName CLASSNAME]
                [--featureExtractor FEATUREEXTRACTOR] [--runTrain RUNTRAIN]
                [--num_epochs NUM_EPOCHS] [--threshold THRESHOLD]
                [--batch_size BATCH_SIZE] [--lr LR]

Arguments:
--path             path to the main directory
--dataset          which dataset to work with [MNIST, MVTEC]
--testset_type     which MNIST testset to work with [diagonal, off-diagonal, spots, cross, mixed, inverted, fashion]
--categoryName     which MVTEC category to work with [bottle, cable, capsule, carpet, grid, hazelnut, leather, metal_nut, pill, screw, tile, toothbrush, transistor, wood, zipper]
--featureExtractor which feature extractor to use [densenet, vgg16]
--runTrain         whether to run the training procedure
--num_epochs       number of training epochs
--threshold        when should the ml training begin
--batch_size       size of the training batch
--lr               learning rate

Instructions for accessing the datasets:
- The whole MVTec AD dataset can be downloaded from the official MVTec website: https://www.mvtec.com/company/research/datasets/mvtec-ad
- The official link for downloading the MNIST dataset: http://yann.lecun.com/exdb/mnist/
- The Fashion MNIST dataset can be downloaded from the zalandoresearch/fashion-mnist GitHub repository: https://github.com/zalandoresearch/fashion-mnist/tree/master/data/fashion

Detection of Anomalous Images using Injective Flows

This README provides a rough overview of the reconstructive and generative power of the implemented injective model and illustrates how this model handles OOD over high-dimensional image datasets.

Experiments conducted over the MVTec dataset are not included in the description. Contact me for more information.

1. Introduction

Traditional normalizing flows require large computational costs to learn transformations of an input distribution, mainly because they operate at exactly the same dimension as the input which is usually high dimensional. We utilized the Trumpet model idea (https://github.com/swing-research/trumpets.git) to implement an injective flow capable of mitigating the computational complexity in normalizing flows via injective mapping. The main task to which we adapted this model is the detection of defects in the manufacturing industry by working with images of various objects and textures from the MVTec dataset. In other words, we used the injective flow to create a distribution of healthy, non-defective images and estimate the exact likelihood of new images based on which we decide if the new image is a part of the generated distribution (non-defective) or if it is an outlier (defective).

Figure 1.1: MVTec dataset - Paul Bergmann, Kilian Batzner, Michael Fauser, David Sattlegger, and Carsten Steger. The mvtec anomaly detection dataset: a comprehensive real-world dataset for unsupervised anomaly detection. International Journal of Computer Vision, 129(4):1038–1059, 2021.

2. Model Architecture

Figure 2.1: Model architecture. The input size 32x32x1 represents the dimension of the MVTec features extracted through the DenseNet- 121 feature extractor.

The architecture of the injective flows allows for efficient likelihood computation of a new sample with respect to two different densities learned in the two different output spaces of the model: the output space of the bijective map, and the output space of the injective map. Therefore, after training the model on nonanomalous examples, we evaluated the likelihoods—the probability that the instance is part of the learned distribution—for each test instance relative to the two learned densities. Our goal was to check if these two likelihoods are drastically different. Intuitively speaking, we were checking if something that looks like an anomaly in one space looks nonanomalous in the other space, and vice versa. From the evaluation results summarized in an AUC-ROC curve, we observed that, for the purpose of outlier detection, the difference in densities in both spaces is not large and can be neglected. This indicates that, as we intuitively expected, the injective mapping contributes to a faster but not a better model evaluation.

3. Model Evaluation

Figure 3.1: Forward propagation of an input image during the evaluation phase.

For computational benefits, we first extract the features from an input test image with a feature extractor of our choice and then feed them to the injective part of the model. The extracted characteristics define the high-dimensional input space R^D. The transformations of the injective part contribute to the reduction of the input dimension, thus producing lower dimensional characteristics in R^d space. These characteristics are then propagated through the bijective part of the model, which, unlike the injective part, preserves their input dimension and maps them into a latent space R^d. By imposing a Gaussian distribution in the latent space, we make the input features normally distributed so that we get a closed-form solution for their probability. In this way, we can easily calculate the probability of a new test sample in relation to two different densities. One is estimated with respect to the bijective transformations, and the other is estimated with respect to both the bijective and injective transformations. The calculated likelihoods are then used to classify the input test example as anomalous or nonanomalous.

4. MNIST Experiments

Given the complexity of our work, at the very beginning, we facilitated a thorough evaluation of the generative and discriminatory power of the model by using the MNIST dataset that is suitable for deep learning. By experimenting with the MNIST dataset, we concluded that the model is capable of reconstructing high-quality images and generating new images from the learned distribution.

Figure 4.1: Example of the reconstruction of 36 input MNIST images using injective models trained on 30,000 MNIST training examples. For each model, the different depth of the injective map, i.e. the number of squeeze-bijective revnet-injective revnet blocks, is written above each column. The first row shows the reconstructions of the whole injective-bijective transformation, while the second row shows their corresponding reconstructions obtained by applying only the inverse bijective transformation.

In contrast, we noticed that the injective flow with deeper injective mappings becomes quite unstable in reconstructing outliers. We tested the discriminatory performance in anomaly detection of the model based on the MNIST dataset using seven different test sets, of which six were artificially created. Having concluded that the model has a remarkable ability to detect anomalies for handwritten digits, we proceeded to work on the same problem for the MVTec dataset.

In addition to reconstructing examples from the learned distribution, we also test the reconstructive power of the injective model on out-of-distribution data. For this purpose, we work with examples from the Fashion MNIST dataset whose reconstructions are carried out by an injective flow with a shallow injective map of depth = 1. From the results, we conclude that our injective model visually changes the clothes in the images to keep the reconstruction of the data close to the learned manifold of the training examples.

Figure 4.2: Example of out-of-distribution data reconstruction using an injective model trained on the MNIST dataset.

Our model proved to be very powerful when generating new images never seen in the training dataset.

Figure 4.3: Example of newly generated digits using an injective model trained on 30,000 MNIST training images. Above each image, we show the depth of the injective map of the model that generates the displayed digits.

5. MVTec Results

Given the dimensionality of the MVTec images, we used VGG16 and DenseNet-121 transfer learning to extract their features and thus reduce their dimension, while still preserving important image information. The extracted features were then averaged across the channel dimension. Apart from the injective mapping, in this way, we further reduced the computational complexity of the model. In addition to feature extraction, we applied background extraction, data augmentation, and data standardization to the MVTec training and test examples. With each of these preprocessing steps, we contributed in a different way to improving the model’s performance. In the rest of the work, we covered the evaluation of the model on such preprocessed images.

Table 5.1: Evaluation results of the injective model when trained and tested on the VGG16 extracted features per category. The results are shown with respect to the original, standardized, and augmented input images. The upper values in each row represent the AUC of the performance of the injective flow, while the lower values represent the AUC of the performance of the bijective flow. The best value for each category is shown in bold.

Table 5.2: Evaluation results of the injective model when trained and tested on the DenseNet-121 extracted features per category. The results are shown with respect to the original, standardized, and two differently augmented datasets. The upper values in each row represent the AUC of the injective flow, while the lower values represent the AUC of the bijective flow. The best value for each category is shown in bold.

6. Conclusion

As a final thought, we can point out that our injective model outperforms the established baselines in detecting MVTec defective objects and textures for most of the categories.

Table 6.1: Comparison of the best AUC values obtained for the injective model relative to those corresponding to the baseline models. The best results for each MVTec category are shown in bold.