Can you shed a little light on how to classify embedding with Softmax?

yong2khoo-lm commented 3 years ago

I read your great article, and the comments in this past issue. You mentioned that Softmax is chosen over kNN, due to its less computationally expensive, with the "learns boundaries between classes". Can you elaborate a little on the "learns boundaries between classes"?

1) I understand Softmax in terms of the math formula - (a generalization of the logistic function to multiple dimensions) from wiki 2) I understand kNN - by computing Euclidean distance in a 128-dimension from the embeddings from different classes

Now.. how does the embedding have something to do with Softmax? From my understanding, Softmax is to map a series of input values to the probability scores. For example: [1.0, 2.0, 3.0, 4.0, 1.0, 2.0, 3.0] ----> [0.02364054, 0.06426166, 0.1746813, 0.474833, 0.02364054,0.06426166, 0.1746813] The thing is... what is your input values here?

Thanks in advance.

ldulcic commented 3 years ago

Hi @kyygit!

I'm glad you liked the article 🙂

You can think of softmax classifier as one-layer neural network where softmax is activation function of the layer. We call it softmax classifier because we use softmax as activation function, but there is more to softmax classifier other than softmax function 🙂 We compute output of softmax classifier like this:

o = softmax(x * w + b)

x - face embedding vector, this is input to softmax classifier w - weights vector of neural network layer, this is learned through training process b - bias term, also part of the weights softmax - softmax function which outputs vector of probabilities o - output vector of class probabilities

When you want to determine the class of embedding x -> calculate o and take the class with highest probability

How does softmax classifier "learn boundaries between classes"? Well, just like any other supervised classifier, it has a set of learnable parameters w which are learned through training process. If you look at the equation above you can see that w dictates what will be the output, which means that w dictates the class of the x therefore it dictates the boundaries of the classes.

To make this more visual, let's assume that embedding x has only 2 dimensions so we can plot it in 2D. Also, let's limit our space in 2D to area between (-10, -10) to (10, 10) and let's generate 5 classes in that area and train softmax classifier on that data. You cannot explicitly extract boundaries from w but you can calculate class for each point in 2D from (-10, -10) to (10, 10) and plot it. Here's a visualization of softmax classifier boundaries:

Classifier boundaries

When classifying embeddings with k-NN, you need to compare new embedding with all embeddings from the database which can be potentially very expensive. When classifying with softmax classifier you only need to calculate above equation to determine embeddings class because we "memorized" boundaries with w.

But don't worry it gets even more complicated because it's not so easy to train softmax classifier 😄 , this will be the topic of my next blog post.

I hope this clears things up, let me know if you have any other question 🙂

yong2khoo-lm commented 3 years ago

Really, thanks for your detailed explanation + visuals. It really helps! 😃

softmax classifier as one-layer neural network where softmax is activation function of the layer

This sentence has greatly summarized everything. Apparently, Softmax can serve as an activation function too (apart from the Sigmoid and ReLU). i have a better picture now. You are using the Neural Network approach instead of the traditional way such as SVM, etc.

When classifying with softmax classifier you only need to calculate above equation to determine embeddings class because we "memorized" boundaries with w

👍 👍 Sounds great!

But don't worry it gets even more complicated because it's not so easy to train softmax classifier 😄 , this will be the topic of my next blog post

I really look forward to your blog post. Thanks again for the information~

ldulcic commented 3 years ago

Hi @kyygit!

Thanks! I'm glad I helped 🙂

softmax classifier as one-layer neural network where softmax is activation function of the layer

This sentence has greatly summarized everything. Apparently, Softmax can serve as an activation function too (apart from the Sigmoid and ReLU). i have a better picture now. You are using the Neural Network approach instead of the traditional way such as SVM, etc.

Softmax is commonly used as the last layer of neural networks for multi-class classification because it transforms raw output of neural network into nice representation of class probabilities which is what you actually want from classifier. You wouldn't use it as an activation function of the hidden layer, ReLU, tanh and sigmoid work better as hidden layer activations.

yong2khoo-lm commented 3 years ago

You wouldn't use it as an activation function of the hidden layer, ReLU, tanh and sigmoid work better as hidden layer activations.

Noted :)

arsfutura / face-recognition

Can you shed a little light on how to classify embedding with Softmax? #21