Closed AndreasR90 closed 10 months ago
Hello, @AndreasR90!
Related: #2810
@bogdankostic do you have any insights to share on this point?
I haven't tried approximate knn with Elasticsearch 8 yet, but I agree with @AndreasR90 that we should allow to set the index_type
for ElasticsearchDocumentStore, just as we do with OpenSearch.
I had a quick look at the Elasticsearch documentation and it seems that Elasticsearch is creating always an index of type HNSW, so indexing time wouldn't even increase for users deciding to use aproximate knn instead of exact knn with Elasticsearch 8.
To perform an approximate knn search, we would just need to set the knn
option in the request body instead of using script_score
.
I had a closer look into this yesterday and have a first implementation of this feature. I can create a draft PR this afternoon. What do you think @bogdankostic ?
@AndreasR90 Yes, creating a draft PR would be awesome. ⭐
Hi @bogdankostic, as promised I opened the Draft PR yesterday. Feel free to have a look and provide feedback :blush:
Closing as won't fix, Haystack 2.x supports HNSW.
Elasticsearch>8.0 has an implementation for an aNN (approximate Nearest Neighbor) Algorithm based on HSNW. The corresponding blogpost https://www.elastic.co/blog/introducing-approximate-nearest-neighbor-search-in-elasticsearch-8-0 indicates that this gives a significant speedup for the query times in comparison to the currently used the exact kNN match. The obvious downside is, that not all actual nearest neighbors are found. In my opinion the decision which algorithm to use should be given to the user of haystack.
It would be ideal to have an additional argument for the ElasticsearchDocumentstore (>=8) where the user can choose which query is used.