Carmoondedraak / FACT-2021

MIT License
0 stars 1 forks source link

Progress Anomaly Maps #7

Open KJ-Waller opened 3 years ago

KJ-Waller commented 3 years ago

MNIST Results: We used a standard variational autoencoder. We trained with inlier digits 1 and 3. Each model was trained with batch size of 128, where each image is an unnormalized grayscale image of 28x28x1, training for a total of 150 epochs each. Each model was trained 3 times, with 3 different learning rates: 0.001, 0.0005, and 0.0001

Loss curves for the 3 models trained on digit 1: mnist_loss_digit1 Attention maps for the three models (trained with different learning rates) evaluated on digits 7. mnist_attmaps_d1d7_lr1e-3 mnist_attmaps_d1d7_lr1e-4 mnist_attmaps_d1d7_lr5e-4

Loss curves for the 3 models trained on digit 3: mnist_loss_digit3 Attention maps for the three models (trained with different learning rates) evaluated on digits 7. mnist_attmaps_d3d7_lr1e-3 mnist_attmaps_d3d7_lr1e-4 mnist_attmaps_d3d7_lr5e-4

UPDATE: The best performing models for MNIST in terms of VAE loss were trained with a learning rate of 0.0005, which are model number 2 for digit 1 and model number 5 for digit 3. Below we show some of the attention maps of these two models evaluated on outlier digits.

For model 2 trained on digit 1, evaluated on digits 7, 4, 9 and 2, as in the paper: batch4-attmaps batch6-attmaps batch8-attmaps batch23-attmaps

For model 5 trained on digit 3, evaluated on digits 8 and 5: batch21-attmaps batch14-attmaps

KJ-Waller commented 3 years ago

UCSD Results: We extended the variational autoencoder used on MNIST, by adding two convolutional layers to deal with images of 100x100x1 from the UCSD dataset. Again, no normalization was performed on the images. We trained for a total of 3 models with 3 different learning rates: 0.001, 0.0005 and 0.0001. The models were trained with a batch size of 128. For each model, we evaluated them by visualizing the generated attention maps by using the activations and gradients from 3 different layers, as mentioned in the paper, with output size 50x50, 25x25 and 12x12.

Loss curves for the 3 models: ucsd_loss

Each of the models was quantitatively evaluated on a test set containing images with pedestrians and vehicles, by calculating the AUROC of the attention maps with as targets the masks. Coincidentally, each model which was trained with different learning rates, achieved the same AUROC regardless of which convolutional layer was used for attention map generation, so we only show the AUROC of the 3 models separately.

The following results are the input images, attention maps, generated binary localization images and the target masks for model 3 with the best AUROC. batch30-input batch30-attmaps batch30-blocmaps batch30-targets

KJ-Waller commented 3 years ago

MVTEC Results: We report some preliminary results on the MVTEC dataset. For this dataset, we used a Resnet-18 based VAE on images of 256x256x3. The images were normalized in this case. One of the better performing models in terms of the AUROC was trained on images of hazelnuts in the MVTEC dataset. This model had a learning rate of 0.0005, and achieved a AUROC 0.8225. Due to the size of the model and images, the batch size had to be restricted to 16 in order to fit it into VRAM. The model is yet to be evaluated on generating the attention maps from different layers.

Below are some results for a batch of 16 hazelnuts. We show the original input image, the generated attention maps, the generated binary localization maps and the target masks. batch3-input batch3-attmaps batch3-blocmaps batch3-targets

We also include some reconstructed images for a batch, which do show a great lack of detail. 480-rec

UPDATE: MVTEC Object Results Seed
Carpet 0.68 3
0.56
Grid 0.69 3
0.52
Leather 0.80 3
0.55
Tile 0.73 3
0.54
Wood 0.7 3
0.54
Bottle 0.7 3
0.59
Cable 0.79 3
0.59
Capsule 0.82 3
0.50
Hazelnut 0.9 3
0.6
Metal Nut 0.8 3
0.5
Pill 0.9 2
0.55
Screw 0.91 3
0.48
Toothbrush 0.83 3
0.49
Transistor 0.72 3
0.48
Zipper 0.67 3
0.48

Below, for each MVTec-AD object, we show broken objects, the attention maps, the generated binary localization maps and target masks: Carpet: batch2-input batch2-attmaps batch2-blocmaps batch2-targets

Grid: batch13-input batch13-attmaps batch13-blocmaps batch13-targets

Leather: batch5-input batch5-attmaps batch5-blocmaps batch5-targets

Tile: batch5-input batch5-attmaps batch5-blocmaps batch5-targets

Wood: batch2-input batch2-attmaps batch2-blocmaps batch2-targets

Bottle: batch10-input batch10-attmaps batch10-blocmaps batch10-targets

Cable: batch15-input batch15-attmaps batch15-blocmaps batch15-targets

Capsule: batch5-input batch5-attmaps batch5-blocmaps batch5-targets

Hazelnut: batch2-input batch2-attmaps batch2-blocmaps batch2-targets

Metal Nut: batch15-input batch15-attmaps batch15-blocmaps batch15-targets

Pill: batch1-input batch1-attmaps batch1-blocmaps batch1-targets

Screw: batch1-input batch1-attmaps batch1-blocmaps batch1-targets

Toothbrush: batch3-input batch3-attmaps batch3-blocmaps batch3-targets

Transistor: batch9-input batch9-attmaps batch9-blocmaps batch9-targets

Zipper: batch9-input batch9-attmaps batch9-blocmaps batch9-targets