Siamese network is a neural network that contain two or more identical subnetwork. The objective of this network is to find the similarity or comparing the relationship between two comparable things. Unlike classification task that uses cross entropy as the loss function, siamese network usually uses contrastive loss or triplet loss.
Siamese network has a lot of function, this repository is trying to use Siamese network to do a dimensionality reduction and image retrieval.
This project follows Hadsell-et-al.'06 [1] by computing the Euclidean distance on the output of the shared network and by optimizing the contrastive loss (see paper for more details). The contastive loss is defined as follows
The is the distance of between the output of the network with the input and the input .
The similarity function is defined as . This function will be activated when the Label equal to 1 and deactivated when is equal to 0. The goal of this function is to minimize the distance of the pairs.
The dissimilarity function is defined as . This function will be activated when the Label is equal to 0 and deactivated when is equal to 1. The goal of this function is to give a penalty of the pairs when the distance is lower than margin .
[1] "Dimensionality Reduction by Learning an Invariant Mapping" http://yann.lecun.com/exdb/publis/pdf/hadsell-chopra-lecun-06.pdf
The input of these will be image_left, image_right and . Our model uses 5 layer of convolutional layer and pooling followed. We do not use fully convolutonal net because convolution operation is faster on GPU(especially using CUDNN). See http://cs231n.github.io/convolutional-networks/#convert for more information on converting FC layer to Conv layer.
Train the model
git clone https://github.com/ardiya/siamesenetwork-tensorflow
python train.py
Tensorboard Visualization(After training)
tensorboard --logdir=train.log
The images below shows the final Result on MNIST test dataset. By only using 2 features, we can easily separate the input images.
The gif below shows some animation until it somehow converges.
Image retrieval uses the trained model to extract the features and get the most similar image using cosine similarity. See here
Select id 865 in test image
Retrieved top n similar image from train data with ids of [53144 47864 11074 51561 41350 34215 48182] from train data