Usage: train.py [--path PATH] [--dataset DATASET]
                [--testset_type TESTSET_TYPE] [--categoryName CLASSNAME]
                [--featureExtractor FEATUREEXTRACTOR] [--runTrain RUNTRAIN]
                [--num_epochs NUM_EPOCHS] [--threshold THRESHOLD]
                [--batch_size BATCH_SIZE] [--lr LR]

Arguments:
--path             path to the main directory
--dataset          which dataset to work with [MNIST, MVTEC]
--testset_type     which MNIST testset to work with [diagonal, off-diagonal, spots, cross, mixed, inverted, fashion]
--categoryName     which MVTEC category to work with [bottle, cable, capsule, carpet, grid, hazelnut, leather, metal_nut, pill, screw, tile, toothbrush, transistor, wood, zipper]
--featureExtractor which feature extractor to use [densenet, vgg16]
--runTrain         whether to run the training procedure
--num_epochs       number of training epochs
--threshold        when should the ml training begin
--batch_size       size of the training batch
--lr               learning rate

Instructions for accessing the datasets:
- The whole MVTec AD dataset can be downloaded from the official MVTec website: https://www.mvtec.com/company/research/datasets/mvtec-ad
- The official link for downloading the MNIST dataset: http://yann.lecun.com/exdb/mnist/
- The Fashion MNIST dataset can be downloaded from the zalandoresearch/fashion-mnist GitHub repository: https://github.com/zalandoresearch/fashion-mnist/tree/master/data/fashion

Detection of Anomalous Images using Injective Flows

This README provides a rough overview of the reconstructive and generative power of the implemented injective model and illustrates how this model handles OOD over high-dimensional image datasets.

Experiments conducted over the MVTec dataset are not included in the description. Contact me for more information.

1. Introduction

Traditional normalizing flows require large computational costs to learn transformations of an input distribution, mainly because they operate at exactly the same dimension as the input which is usually high dimensional. We utilized the Trumpet model idea (https://github.com/swing-research/trumpets.git) to implement an injective flow capable of mitigating the computational complexity in normalizing flows via injective mapping. The main task to which we adapted this model is the detection of defects in the manufacturing industry by working with images of various objects and textures from the MVTec dataset. In other words, we used the injective flow to create a distribution of healthy, non-defective images and estimate the exact likelihood of new images based on which we decide if the new image is a part of the generated distribution (non-defective) or if it is an outlier (defective).

Figure 1.1: MVTec dataset - Paul Bergmann, Kilian Batzner, Michael Fauser, David Sattlegger, and Carsten Steger. The mvtec anomaly detection dataset: a comprehensive real-world dataset for unsupervised anomaly detection. International Journal of Computer Vision, 129(4):1038–1059, 2021.

2. Model Architecture

Figure 2.1: Model architecture. The input size 32x32x1 represents the dimension of the MVTec features extracted through the DenseNet- 121 feature extractor.

The architecture of the injective flows allows for efficient likelihood computation of a new sample with respect to two different densities learned in the two different output spaces of the model: the output space of the bijective map, and the output space of the injective map. Therefore, after training the model on nonanomalous examples, we evaluated the likelihoods—the probability that the instance is part of the learned distribution—for each test instance relative to the two learned densities. Our goal was to check if these two likelihoods are drastically different. Intuitively speaking, we were checking if something that looks like an anomaly in one space looks nonanomalous in the other space, and vice versa. From the evaluation results summarized in an AUC-ROC curve, we observed that, for the purpose of outlier detection, the difference in densities in both spaces is not large and can be neglected. This indicates that, as we intuitively expected, the injective mapping contributes to a faster but not a better model evaluation.

3. Model Evaluation

Figure 3.1: Forward propagation of an input image during the evaluation phase.

For computational benefits, we first extract the features from an input test image with a feature extractor of our choice and then feed them to the injective part of the model. The extracted characteristics define the high-dimensional input space R^D. The transformations of the injective part contribute to the reduction of the input dimension, thus producing lower dimensional characteristics in R^d space. These characteristics are then propagated through the bijective part of the model, which, unlike the injective part, preserves their input dimension and maps them into a latent space R^d. By imposing a Gaussian distribution in the latent space, we make the input features normally distributed so that we get a closed-form solution for their probability. In this way, we can easily calculate the probability of a new test sample in relation to two different densities. One is estimated with respect to the bijective transformations, and the other is estimated with respect to both the bijective and injective transformations. The calculated likelihoods are then used to classify the input test example as anomalous or nonanomalous.

4. MNIST Experiments

Given the complexity of our work, at the very beginning, we facilitated a thorough evaluation of the generative and discriminatory power of the model by using the MNIST dataset that is suitable for deep learning. By experimenting with the MNIST dataset, we concluded that the model is capable of reconstructing high-quality images and generating new images from the learned distribution.

Figure 4.1: Example of the reconstruction of 36 input MNIST images using injective models trained on 30,000 MNIST training examples. For each model, the different depth of the injective map, i.e. the number of squeeze-bijective revnet-injective revnet blocks, is written above each column. The first row shows the reconstructions of the whole injective-bijective transformation, while the second row shows their corresponding reconstructions obtained by applying only the inverse bijective transformation.

MarijaStojchevska / Injective-Flow-for-Anomaly-Detection

readme

Detection of Anomalous Images using Injective Flows

1. Introduction

2. Model Architecture

3. Model Evaluation

4. MNIST Experiments

5. MVTec Results

6. Conclusion