Closed jtibshirani closed 2 years ago
Pinging @elastic/es-search (Team:Search)
@javanna brought up an interesting question that I'll post here for context. Before, the DFS phase on each shard was quite lightweight, as it just collects term and field statistics. Now we will also run a kNN search, which can have a higher latency. Is it possible that certain parts of the code assume DFS will be very fast?
I am not aware of assumptions like this. We make sure to fork the DFS phase to the search threadpool, just like we do for the query phase. Also calculating DFS involves parsing and rewriting the Query
, then creating a Weight
, which are not really guaranteed to be lightweight.
When we discussed making DFS_QUERY_THEN_FETCH
the default search type (https://github.com/elastic/elasticsearch/issues/20554), the point was made that the DFS phase might not always be cheap, e.g. maybe intersecting a fuzzy
query with the terms dictionary is expensive, or the query has many terms, or many fields (which often is implicity, e.g. by default Elasticsearch searches all fieds). But we never proceeded with proper benchmarking to my knowledge.
I'm going to close this out in favor of these two follow-ups:
Currently approximate nearest neighbor search is powered by its own
_knn_search
endpoint. We introduced ANN in a separate endpoint because there were big open questions around how it should integrate with the search API (for example around score combination, aggregations, pagination, etc.) We now have a better understanding on many of these topics and plan to integrate ANN into_search
.Proposed API
ANN will be powered by a new
knn
section in the search request:This search finds the global top k=5 vector results, finds all term-based matches, combines them, and finally takes the top 20 top-scoring results. One way to think about it: it's just a normal search, but the
knn
section addsk
additional matching documents based on vector similarity, like taking a boolean disjunction between the query and the ANN search. You can page through search results as usual, in case you've retrieved more interesting vector and term-based matches than fit on one page.Scoring. In this initial version we'll compute the score of each hit as a weighted sum of the vector and term-based scores. This matches the usual way scores are combined in Elasticsearch, by summing query scores with optional boosts. In the future we can consider other ways of combining scores like reciprocal rank fusion.
Retrieval. The top
k
vector results represent the global nearest neighbors across all shards. This contrasts with a shard-centric approach, where we would retrieve thek
nearest neighbors per shard. In a shard-centric approach, there would actually bek * num_shards
matches, which you could see while paging through results, etc. We think the global behavior will be more predictable and easier to understand, especially when searching across multiple indices, where each index could have a different number of shards.The new
knn
section would still be marked experimental/ technical preview. Making ANN generally available (GA) is a separate step.Implementation
A naive implementation would create a bool query on each shard, taking a disjunction between the term-based
query
and a "knn" query that retrievesk
nearest documents. But this would give the shard-centric behavior described above, where there arek * num_shards
total vector matches, instead of a global topk
.Our current plan is to perform ANN during an initial search phase that runs before the query phase. Specifically:
k
vector matches per shard, combine them and keep the global topk
. (DFS is an existing phase that runs before the query phase, and collects the distributed term statistics. We would be adding a new step to the phase.)k
vector matches into a special "doc and score" query that matches only those top documents. For the final query, take a boolean disjunction between this "doc and score" query and the term-basedquery
that was passed in.All other search phases work the same (like fetching hits, field collapsing, etc.) as they just operate on the result of the query phase.
Steps The feature will be developed on a feature branch.
Follow-ups (can be done after initial merge):