Clarify the distinction between cosine, angular, and inner product similarity

After a bit of reviewing, mostly on Wikipedia, my revised understanding of these similarities is this:

Inner product similarity of two vectors is just their dot product:

Cosine similarity of two vectors is the dot product divided by the product of their norms:

Angular similarity is the normalized angle between two vectors:

So, angular similarity is a function of cosine similarity, which is a function of inner product similarity.

What are the practical differences? As far as I can tell, they are:

Inner product is sensitive to the vector magnitudes. So two vectors with a very small angle could have a smaller inner product than two vectors with a larger angle and larger magnitudes.
Cosine similarity removes the effect of vector magnitude by dividing by the product of the vector norms, but it's not a proper distance metric.
Angular similarity accounts for magnitude and is a proper distance metric.

Elastiknn's current implementation of "angular" similarity is actually cosine similarity under the hood.

public static class Angular implements DenseFloat {
        @Override
        @ForceInline
        public double similarity(float[] v1, float[] v2) {
            double dotProd = 0.0;
            double v1SqrSum = 0.0;
            double v2SqrSum = 0.0;
            for (int i = 0; i < v1.length; i++) {
                dotProd += v1[i] * v2[i];
                v1SqrSum += Math.pow(v1[i], 2);
                v2SqrSum += Math.pow(v2[i], 2);
            }
            double denom = Math.sqrt(v1SqrSum) * Math.sqrt(v2SqrSum);
            if (denom > 0) return 1 + (dotProd / denom);
            else if (Arrays.equals(v1, v2)) return 2;
            else return 0;
        }
    }

So that's a mistake.

Elastiknn also has two approximate algorithms for angular similarity: AngularLshModel and PermutationLshModel. Next I need to do some thinking on which specific similarity these models actually approximate.

The Angular LSH model takes a fixed set of random gaussian vectors and computes the dot product of a given vector (for indexing or searching) against each of the random vectors. Then it concatenates the signs of these dot products to build hashes.

The Permutation LSH model describes a vector by the k indices (positions in the vector) with the greatest absolute values. The intuition is that each index corresponds to some latent concept, and indices with high absolute values carry more information about their respective concepts than those with low absolute values. The research for this method has focused mainly on Angular similarity, though the implementation supports Angular, L1, and L2.

In both cases, the magnitude of the vector matters. So it seems more correct to consider these methods as approximations for the inner product. However, if we actually want the cosine similarity, I think we can just pre-process the vectors such that they have unit norm. The current implementation actually pretty much assumes you've done this, as it uses the cosine similarity function to re-rank the approximate candidates.

~~So my thinking for next steps is this:~~

~~Replace all usage of the term "angular similarity" with "inner product similarity".~~
~~Update the Java method above to simply compute the dot product.~~
~~Keep the LSH models as they are.~~

Edit, after some more thinking:

For Angular LSH, the magnitude of the vector does not matter, as we only look at the sign of the dot product. For Permutation LSH, the absolute magnitude also does not matter, as we only look at the relative ordering of magnitudes. For example, v1=[99, 10, 13, -25], v2=[9.9, 1.0, 1.3, -2.5], and v3=[0.99, 0.1, 0.13, -0.25] will each produce the same set of hashes for both of these methods.

I verified this by adding tests here: #292

Each of these methods uses the cosine similarity to re-rank the approximate candidates. Again, the cosine similarity is incorrectly referred to as the "angular" similarity in the API and docs. So the re-ranking is also invariant to vector magnitude. So both the approximate and exact models for angular similarity are invariant to vector magnitude.

To summarize, I see two main problems:

The "Angular" similarity referenced in the API and docs is actually implemented as cosine similarity.
There is no way to compute the similarity in a way that vector magnitude would affect the score. I.e., it is impossible to compute true inner-product or dot-product similarity.

The second problem is not solvable, as the approximate methods cannot retrieve candidates in a way that would consider vector magnitude. I'm actually not 100% sure anyone would want to do this anyways.

So I think there are two possible solutions:

Rename "angular similarity" as "cosine similarity" (but keep "angular" for backwards-compatibility), or
Update the angular similarity method so that it actually computes the angular similarity.

Right now I'm leaning towards the first option.

Some other misc. notes:

This math.stackexchange post mentions how cosine similarity loses precision for very similar vectors, and angular similarity does a better job of distinguishing them.

alexklibisz / elastiknn

Clarify the distinction between cosine, angular, and inner product similarity #277