elastic / elasticsearch

Free and Open, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
68.58k stars 24.36k forks source link

Add Euclidean Distance Support Like Solr [Enhacement] + Cuda Support #39109

Closed deimsdeutsch closed 5 years ago

deimsdeutsch commented 5 years ago

I am doing machine learning which involves the manipulation of matrix's and over time we have store multiple such double arrays i.e arrays with 1024 dimensions and calculating Euclidean between millions of such arrays takes a toll on a front end application and should be handled by the database. Also doing such computation on the CPU will be time consuming and will take more than 10-20 seconds this is the issue we are facing with SOLR's dist(2, function. Using GPU to do the same would hardly take 50 ms. This is a trending feature that huge number of people would be looking for. Please support and add this.

elasticmachine commented 5 years ago

Pinging @elastic/es-analytics-geo

jpountz commented 5 years ago

Related to #37947. cc @mayya-sharipova

jpountz commented 5 years ago

@deimsdeutsch Can you give us more information on what your use-case is? For instance are you trying to compute the nearest neighbors to a query vector?

mayya-sharipova commented 5 years ago

@deimsdeutsch Adding a new Euclidean distance function to a [dense_vector] (https://www.elastic.co/guide/en/elasticsearch/reference/master/dense-vector.html) is very easy, but it will be again use CPU. I don't think we currently have plans for using GPU for calculations, it may require a lot of investigations.

deimsdeutsch commented 5 years ago

@jpountz Image similarity. Not nearest neighbor but https://stackoverflow.com/questions/1401712/how-can-the-euclidean-distance-be-calculated-with-numpy

@mayya-sharipova You people have no idea how many people would want this feature. Almost every smart city project would be using this on PetaByte scale.

dexception commented 5 years ago

+1 Would love to have this feature using cuda. This can increase the performance 100 times.

jpountz commented 5 years ago

@tveasey @stevedodson I heard via @benwtrent that you have been investigating running some computations on the GPU, you might want to have a look at this issue.

tveasey commented 5 years ago

The way we're thinking about using GPUs in ML is for model inference and possibly training (making use of some existing framework). These would be run using the persistent task framework and pass the work to processes running on specific nodes in the cluster.

If this is for image similarity then I'd have thought this would be for lookup of similar images and the natural approach is (some form of) approximate nearest neighbours search, which wouldn't benefit from hardware acceleration. (If one wants to compute an embedding of an image (with a convnet to create the vectors), a good use case for a GPU, and run this inside the cluster then that would be a fit for what's on our roadmap.)

In general, I think without having a clearer definition of the problem to be solved it is difficult to say whether using a GPU is advantageous: for many cases, reading vectors from disk and returning results over the network to an application would be the bottleneck not any computations done with them.

jpountz commented 5 years ago

@deimsdeutsch Can you clarify your use-case based on @tveasey's above comments?

deimsdeutsch commented 5 years ago

Just want to calculate Euclidean distance on 1 billion vectors...using gpu would help in computing the same parallely on cuda cores.

tveasey commented 5 years ago

I still think this observation applies:

reading vectors from disk and returning results over the network to an application would be the bottleneck not any computations done with them

Actually loading 1B vectors from disk and sending back 1B floats (assuming this is what is needed) would be more expensive than computing 1B distances on CPU, so compute won't be the bottleneck. If you're saying what you need is all, n (n-1) / 2, pairwise distances, then I think there are more serious problems: what would you do with the 10^18 or so floats, etc.

Most applications using distances care primarily about nearest neighbours (to each point) where the requirement is to avoid brute force complexity. The techniques for doing this wouldn't benefit from using a GPU.

jpountz commented 5 years ago

Closing based on the above discussion. Thanks for your inputs @deimsdeutsch and @tveasey !

dexception commented 4 years ago

I hope you guys take a second look at this issue because setting up a CPU cluster when you can handle this on a single machine with a good GPU plus with Cuda Blas library you add additional operations.