deepset-ai / haystack

:mag: AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.
https://haystack.deepset.ai
Apache License 2.0
17.66k stars 1.91k forks source link

Support Elasticsearch Approximate Nearest Neighbors #5557

Closed AndreasR90 closed 10 months ago

AndreasR90 commented 1 year ago

Elasticsearch>8.0 has an implementation for an aNN (approximate Nearest Neighbor) Algorithm based on HSNW. The corresponding blogpost https://www.elastic.co/blog/introducing-approximate-nearest-neighbor-search-in-elasticsearch-8-0 indicates that this gives a significant speedup for the query times in comparison to the currently used the exact kNN match. The obvious downside is, that not all actual nearest neighbors are found. In my opinion the decision which algorithm to use should be given to the user of haystack.

It would be ideal to have an additional argument for the ElasticsearchDocumentstore (>=8) where the user can choose which query is used.

anakin87 commented 1 year ago

Hello, @AndreasR90!

Related: #2810

@bogdankostic do you have any insights to share on this point?

bogdankostic commented 1 year ago

I haven't tried approximate knn with Elasticsearch 8 yet, but I agree with @AndreasR90 that we should allow to set the index_type for ElasticsearchDocumentStore, just as we do with OpenSearch.

I had a quick look at the Elasticsearch documentation and it seems that Elasticsearch is creating always an index of type HNSW, so indexing time wouldn't even increase for users deciding to use aproximate knn instead of exact knn with Elasticsearch 8.

To perform an approximate knn search, we would just need to set the knn option in the request body instead of using script_score.

AndreasR90 commented 1 year ago

I had a closer look into this yesterday and have a first implementation of this feature. I can create a draft PR this afternoon. What do you think @bogdankostic ?

bogdankostic commented 1 year ago

@AndreasR90 Yes, creating a draft PR would be awesome. ⭐

AndreasR90 commented 1 year ago

Hi @bogdankostic, as promised I opened the Draft PR yesterday. Feel free to have a look and provide feedback :blush:

masci commented 10 months ago

Closing as won't fix, Haystack 2.x supports HNSW.