elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.55k stars 24.61k forks source link

Investigate various implementations of ann search for vector fields #42326

Closed mayya-sharipova closed 2 years ago

mayya-sharipova commented 5 years ago

ann (approximate nearest neighbours) will be a licensed feature of Elasticsearch (not OSS).

We plan to implement prototypes of various algorithms for ann for different distance metrics:

We are interested in users' feedback about:

We have decided to adopt Lucene implementations on ann search, so development of ann search is moved here. Relevant Lucene issues: https://issues.apache.org/jira/browse/LUCENE-9004, https://issues.apache.org/jira/browse/LUCENE-9322, https://issues.apache.org/jira/browse/LUCENE-9136

alexklibisz commented 2 years ago

Posting a friendly annual reminder that the KNN/ANN feature is already implemented in https://github.com/alexklibisz/elastiknn. :)

I'd encourage folks to try out the Elastiknn plugin while waiting on Elastic to integrate Lucene's ANN implementation. Speaking from experience, there are a number of non-trivial problems to solve to ensure a reliable, scalable implementation of ANN.

AFAIK, Elastic will be using the HNSW-based ANN implementation from Lucene. Elastiknn uses another approach called LSH (Locality Sensitive Hashing). They have some interesting tradeoffs, which are described at a high level in Pinecone's blog post: https://www.pinecone.io/learn/vector-indexes/. For a deeper dive on LSH, I wrote this post: https://elastiknn.com/posts/tour-de-elastiknn-august-2021/

I'm sure Elastic can eventually get this integrated, but if you're excited to try out ANN for a proof-of-concept or just to learn about KNN/ANN, Elastiknn is ready to try, and I'd be happy to hear any feedback.

Cheers

plassr commented 2 years ago

Here is another Elasticsearch kNN plugin. It allows pre-filtering for multimodal search and scales to billions of documents with ANN. https://www.gsitechnology.com/sites/default/files/AppNotes/GSIT-Elasticsearch-Plugin-AppBrief.pdf

jtibshirani commented 2 years ago

Is there any update on this? Could you also provide a rough time-plan for such an integration?

I'm sorry for the delay, it took some time to find consensus on a plan. I opened an issue here to track the implementation: https://github.com/elastic/elasticsearch/issues/78473. Since this feature will be based on Lucene's new ANN support (which is shipping in its upcoming 9.0 release), it only targets Elasticsearch 8.x. We don't give exact release dates, but for context we are hard at work on both ANN and the 8.0 release.

jtibshirani commented 2 years ago

I'm going to close this now that we have an implementation plan here: #78473. Thank you so much for the insights and feedback on this issue (and your incredible patience)!