Speedup pysparnn with CUDA

vmarkovtsev commented 7 years ago

Hi!

I think this project is similar to the toolchain we are currently using in our production - https://github.com/ekzhu/datasketch + https://github.com/src-d/minhashcuda Do you think it is possible to leverage minhashcuda in pysparnn?

spencebeecher commented 7 years ago

Hi @vmarkovtsev !

I just had a look at both projects and they are both extremely impressive. The Hashing approaches used in minhash cuda and datasketch are an alternative method to what pysparnn does. Since you are already familiar with minhashing I'll try to describe the process for pysparnn:

Given n records, randomly select sqrt(n) records and use those as 'candidates'.
For each record, assign it to the closest candidate

The above can actually be done recursively to optimize for matrix multiply size (instead of 2 levels you can do N levels). Sometimes the data is not distributed evenly across the clusters (parts of the space are much more dense than others) so this recursion can be important.

The top level is represented as a matrix.

Search is done by iteratively finding the closest cluster and then returning the closest root level items. Cosine similarity is implemented as a matrix vector multiply

Hopefully you will see how pysparnn takes a different approach from sketching/min-hash. Comparing to the LSH module in sk-learn, pysparnn tends to perform better in terms of speed and recall on a single CPU (I think the LSH code is much more -easily- scaleable in theory). Seek the examples/ directory for the datasets I used. I have not compared to datasketch so I don't know that benchmark =)

Having said all that - I think pysparnn could benefit from GPUs as the search resorts to iterative matrix X matrix multiplies.

The lib was designed to be easily extendable. If you are interested, all you would have to do is provide a custom subclass of MatrixMetricSearch - https://github.com/facebookresearch/pysparnn/blob/master/pysparnn/matrix_distance.py#L133

I would gladly take a PR to include CUDA code though we should chat about where the contribution would go in the repository.

In any case, I am a fan of your work and your article - https://blog.sourced.tech/post/minhashcuda/

Very nice!

vmarkovtsev commented 7 years ago

@spencebeecher Great! Thank you very much for this warm feedback! You made my day.

I would contribute CUDA code with pleasure, provided by your colleagues from Faiss are not up to it :-)

If everything boils down to matrix multiplication, I will be able to use cuBLAS GEMM getting the peak perf. I think if the matrix size becomes small (say, 16), it is faster to perform the multiplication with AVX instead, so it is going to be fun.

Renaming this issue to adjust the intent.

spencebeecher commented 7 years ago

@vmarkovtsev,

Actually you bring up a great point - I missed the OS release of Fiass!

The Faiss team already has a GPU implementation and they implement a number of methods that include one similar to what pysparnn has. I will still take the PR but I would comment that Faiss already has that functionality and is generally more performant since they have a C++ implementation. If you have to 'make' I would go with Fiass.

I will say that pysparnn is still targeted towards sparse vectors so if you have a sparse GPU impl in mind that would be interesting. I think Fiass would be interested too but they are more targeted towards dense vectors - (you would have to confirm future direction with them).

spencebeecher commented 7 years ago

Any updates? If i dont hear back in a week i am going to close this one out!

vmarkovtsev commented 7 years ago

Sorry, missed this. Indeed, Faiss works with dense vectors, and the sparse input support is very valuable. We've got cuSPARSE which is able to accelerate the ops and we've got wrappers. Honestly, I don't think I will have time to play with it before April - but I would love to. You may close this and I will reopen when ready.

spencebeecher commented 7 years ago

Great - Thanks for introducing me to such cool projects @vmarkovtsev !

facebookresearch / pysparnn

Speedup pysparnn with CUDA #10