LESCI Development and Evaluation

Simsso commented 5 years ago

LESCI (thanks for that awesome naming suggestion @FlorianPfisterer) stands for large embedding space constant initialization. This work item involves the development of a LESCI layer (based on a VQ-layer with cosine similarity lookup #63) and its empirical evaluation.

Simsso commented 5 years ago

Steps

[x] Export a bunch of activations from a ResNet layer using training samples (only the correctly classified ones)
[x] Import the activations into a Python script
[x] Develop a LESCI layer function which initializes the embedding space of a #63 layer with the imported activations
[x] Feed validation samples through the network and make sure it's neither random guessing nor 100% accurate (it has to work)
[ ] Fine-tune the abs_identity_mapping_threshold hyperparameter by creating a plot "threshold vs. validation accuracy"
[ ] Settle with a value and export the network with the LESCI layer into a checkpoint
[ ] Submit to the challenge (makes everything else seem simple)

Simsso commented 5 years ago

LESCI + PCA + Cosine Similarity VQ-Layer

Suggesting the following setup:

ResNet pre-trained with ALP weights
Feeding the training dataset through the network, saving all activations of the layer act5_block3 (for correctly classified samples) alongside with the labels. That yields a list of tuples (label, activation), where activation is a vector of size 4096. The length of the list is 74,246.
The layer looks the most promising.
Run a PCA on the list and extract the "compression matrix" (shape: 4096x32; 32 because it segregates the classes the best).
Add the compression matrix to the ResNet, following the layer act5_block3.
Attach a cosine VQ-layer (added in PR #88) to the compression-matmul-layer. The embedding space is initialized with the list from step (2) multiplied by the PCA matrix from step (3). The size should be reasonably low after performing the multiplication.
For each sample perform the k-NN lookup in the compressed space and proceed depending on whether a certain majority threshold is exceeded: (a) if there are more than e.g. 6 of all 16-nearest neighbors of the same class, the network output is that class (no need to feed through the remaining ResNet layers and no need to compute an inverse at any time); (b) otherwise, use the identity function and skip the VQ layer and use the ResNet to classify.

Simsso commented 5 years ago

Next steps: hyperparameter tuning and positioning of the layer

Simsso / NIPS-2018-Adversarial-Vision-Challenge

LESCI Development and Evaluation #86

Steps

LESCI + PCA + Cosine Similarity VQ-Layer