Closed mayya-sharipova closed 2 years ago
Posting a friendly annual reminder that the KNN/ANN feature is already implemented in https://github.com/alexklibisz/elastiknn. :)
I'd encourage folks to try out the Elastiknn plugin while waiting on Elastic to integrate Lucene's ANN implementation. Speaking from experience, there are a number of non-trivial problems to solve to ensure a reliable, scalable implementation of ANN.
AFAIK, Elastic will be using the HNSW-based ANN implementation from Lucene. Elastiknn uses another approach called LSH (Locality Sensitive Hashing). They have some interesting tradeoffs, which are described at a high level in Pinecone's blog post: https://www.pinecone.io/learn/vector-indexes/. For a deeper dive on LSH, I wrote this post: https://elastiknn.com/posts/tour-de-elastiknn-august-2021/
I'm sure Elastic can eventually get this integrated, but if you're excited to try out ANN for a proof-of-concept or just to learn about KNN/ANN, Elastiknn is ready to try, and I'd be happy to hear any feedback.
Cheers
Here is another Elasticsearch kNN plugin. It allows pre-filtering for multimodal search and scales to billions of documents with ANN. https://www.gsitechnology.com/sites/default/files/AppNotes/GSIT-Elasticsearch-Plugin-AppBrief.pdf
Is there any update on this? Could you also provide a rough time-plan for such an integration?
I'm sorry for the delay, it took some time to find consensus on a plan. I opened an issue here to track the implementation: https://github.com/elastic/elasticsearch/issues/78473. Since this feature will be based on Lucene's new ANN support (which is shipping in its upcoming 9.0 release), it only targets Elasticsearch 8.x. We don't give exact release dates, but for context we are hard at work on both ANN and the 8.0 release.
I'm going to close this now that we have an implementation plan here: #78473. Thank you so much for the insights and feedback on this issue (and your incredible patience)!
ann (approximate nearest neighbours) will be a licensed feature of Elasticsearch (not OSS).
We plan to implement prototypes of various algorithms for ann for different distance metrics:
We are interested in users' feedback about:
We have decided to adopt Lucene implementations on ann search, so development of ann search is moved here. Relevant Lucene issues: https://issues.apache.org/jira/browse/LUCENE-9004, https://issues.apache.org/jira/browse/LUCENE-9322, https://issues.apache.org/jira/browse/LUCENE-9136