aaalgo / kgraph

A library for k-nearest neighbor search
BSD 2-Clause "Simplified" License
383 stars 85 forks source link

How to compile kgraph with OpenBLAS? #20

Closed suzaku closed 6 years ago

suzaku commented 6 years ago

I'm using the Python binding of kgraph. I've tried the following to enable OpenBLAS:

  1. Update the OpenBLAS LDLIBS variable in this Makefile
  2. Run python setup.py install
  3. Pass blas=True when searching

But the speed was the same with or without the blas=True argument. What can I do to make sure OpenBLAS is used?

By the way, kgraph is the fastest ANN (assuming the same level of recall) package I have discovered so far for my data. But it's not as fast as I expect it to be, it's about ten times faster than my highly optimized bruteforce implementation with sklearn, while I expect it to be at least 50 times faster. I built numpy with OpenBLAS in my bruteforce implementation, so I want to make sure kgraph also uses OpenBLAS and try it again.

Thanks.

aaalgo commented 6 years ago

If you use the Python API, then the similarity metric is L2-like. There's no matrix operation, so Blas won't help even in theory. I didn't use Blas in implementation.

10x speedup over bruteforce with acceptable loss of recall is typical. 50x is rather difficult to achieve. You need a very big dataset to demonstrate that.

suzaku commented 6 years ago

@aaalgo

Thanks for replying.

I know that for exact L2 space KNN, BLAS won't help.

But strictly speaking, the brute algorithm provided by sklearn is not exact KNN (it doesn't use the most precise way to calculate the distances). The formula sklearn used for calculating euclidean distances is: dist(x, y) = sqrt(dot(x, x) - 2 * dot(x, y) + dot(y, y)).

According to the docstring for the euclidean_distances:

This formulation has two advantages over other ways of computing distances. First, it is computationally efficient when dealing with sparse data. Second, if one argument varies but the other remains unchanged, then dot(x, x) and/or dot(y, y) can be pre-computed.

Do you think this algorithm would be helpful in the implementation of kgraph?

BTW: When I said 10x faster, I was comparing kgraph with this non-exact, OpenBLAS enhanced KNN with dot(y, y) cached.

aaalgo commented 6 years ago

Your dist(x,y) formulation with dot is PRECISE. If X is your database and Y is the query set, then dist(X, Y) can be formulated in matrix multiplication and hence the BLAS speedup, which could be as large as 30x. When one uses an index like kgraph, not all dist(x, y) is computed for every x in X and y in Y. So each dist(x, y) used is separately calculated using 1-D vector operations. Such operations do not benefit from BLAS.

suzaku commented 6 years ago

I mean if kgraph ever needs to compare the query vector with tens or hundreds of other vectors (not the whole database), is it possible to somehow do it in bulk and utilize the formula for BLAS speedup?

aaalgo commented 6 years ago

Unfortunately currently there's no easy way to do it in bulk. In general there are two ways: 1. speedup by making each operation fast. This is BLAS. 2. speedup by doing less number of operations. This is KGraph. The number of operations is reduced by carefully planning what operation to do next after seeing this results of the current operation. The process is inevitably sparse and out-of-order. So we cannot make advantage of cache locality like BLAS.

suzaku commented 6 years ago

Get it, thanks.