Usage: train.py [--path PATH] [--dataset DATASET]
[--testset_type TESTSET_TYPE] [--categoryName CLASSNAME]
[--featureExtractor FEATUREEXTRACTOR] [--runTrain RUNTRAIN]
[--num_epochs NUM_EPOCHS] [--threshold THRESHOLD]
[--batch_size BATCH_SIZE] [--lr LR]
Arguments:
--path path to the main directory
--dataset which dataset to work with [MNIST, MVTEC]
--testset_type which MNIST testset to work with [diagonal, off-diagonal, spots, cross, mixed, inverted, fashion]
--categoryName which MVTEC category to work with [bottle, cable, capsule, carpet, grid, hazelnut, leather, metal_nut, pill, screw, tile, toothbrush, transistor, wood, zipper]
--featureExtractor which feature extractor to use [densenet, vgg16]
--runTrain whether to run the training procedure
--num_epochs number of training epochs
--threshold when should the ml training begin
--batch_size size of the training batch
--lr learning rate
Instructions for accessing the datasets:
- The whole MVTec AD dataset can be downloaded from the official MVTec website: https://www.mvtec.com/company/research/datasets/mvtec-ad
- The official link for downloading the MNIST dataset: http://yann.lecun.com/exdb/mnist/
- The Fashion MNIST dataset can be downloaded from the zalandoresearch/fashion-mnist GitHub repository: https://github.com/zalandoresearch/fashion-mnist/tree/master/data/fashion
This README provides a rough overview of the reconstructive and generative power of the implemented injective model and illustrates how this model handles OOD over high-dimensional image datasets.
Experiments conducted over the MVTec dataset are not included in the description. Contact me for more information.
For computational benefits, we first extract the features from an input test image with a feature extractor of our choice and then feed them to the injective part of the model. The extracted characteristics define the high-dimensional input space R^D. The transformations of the injective part contribute to the reduction of the input dimension, thus producing lower dimensional characteristics in R^d space. These characteristics are then propagated through the bijective part of the model, which, unlike the injective part, preserves their input dimension and maps them into a latent space R^d. By imposing a Gaussian distribution in the latent space, we make the input features normally distributed so that we get a closed-form solution for their probability. In this way, we can easily calculate the probability of a new test sample in relation to two different densities. One is estimated with respect to the bijective transformations, and the other is estimated with respect to both the bijective and injective transformations. The calculated likelihoods are then used to classify the input test example as anomalous or nonanomalous.
In addition to reconstructing examples from the learned distribution, we also test the reconstructive power of the injective model on out-of-distribution data. For this purpose, we work with examples from the Fashion MNIST dataset whose reconstructions are carried out by an injective flow with a shallow injective map of depth = 1. From the results, we conclude that our injective model visually changes the clothes in the images to keep the reconstruction of the data close to the learned manifold of the training examples.
Our model proved to be very powerful when generating new images never seen in the training dataset.
As a final thought, we can point out that our injective model outperforms the established baselines in detecting MVTec defective objects and textures for most of the categories.