Integrate ANN into `_search` endpoint

jtibshirani commented 2 years ago

Currently approximate nearest neighbor search is powered by its own _knn_search endpoint. We introduced ANN in a separate endpoint because there were big open questions around how it should integrate with the search API (for example around score combination, aggregations, pagination, etc.) We now have a better understanding on many of these topics and plan to integrate ANN into _search.

Proposed API

ANN will be powered by a new knn section in the search request:

POST products/_search
{
  "query": {
    "multi_match": {
      "query": "black flower dress",
      "fields": ["title", "description"],
      "boost": 0.9
    }
  },
  "knn": {
    "field": "title_vector",
    "query_vector": [0.3, 0.1, ...],
    "k": 5,
    "num_candidates": 50,
    "boost": 0.1
  },
  "size": 20
}

This search finds the global top k=5 vector results, finds all term-based matches, combines them, and finally takes the top 20 top-scoring results. One way to think about it: it's just a normal search, but the knn section adds k additional matching documents based on vector similarity, like taking a boolean disjunction between the query and the ANN search. You can page through search results as usual, in case you've retrieved more interesting vector and term-based matches than fit on one page.

Scoring. In this initial version we'll compute the score of each hit as a weighted sum of the vector and term-based scores. This matches the usual way scores are combined in Elasticsearch, by summing query scores with optional boosts. In the future we can consider other ways of combining scores like reciprocal rank fusion.

Retrieval. The top k vector results represent the global nearest neighbors across all shards. This contrasts with a shard-centric approach, where we would retrieve the k nearest neighbors per shard. In a shard-centric approach, there would actually be k * num_shards matches, which you could see while paging through results, etc. We think the global behavior will be more predictable and easier to understand, especially when searching across multiple indices, where each index could have a different number of shards.

The new knn section would still be marked experimental/ technical preview. Making ANN generally available (GA) is a separate step.

Implementation

A naive implementation would create a bool query on each shard, taking a disjunction between the term-based query and a "knn" query that retrieves k nearest documents. But this would give the shard-centric behavior described above, where there are k * num_shards total vector matches, instead of a global top k.

Our current plan is to perform ANN during an initial search phase that runs before the query phase. Specifically:

In the DFS phase, collect the top k vector matches per shard, combine them and keep the global top k. (DFS is an existing phase that runs before the query phase, and collects the distributed term statistics. We would be adding a new step to the phase.)
Then move to the query phase. Convert the top k vector matches into a special "doc and score" query that matches only those top documents. For the final query, take a boolean disjunction between this "doc and score" query and the term-based query that was passed in.

All other search phases work the same (like fetching hits, field collapsing, etc.) as they just operate on the result of the query phase.

Steps The feature will be developed on a feature branch.

[x] https://github.com/elastic/elasticsearch/pull/88002
[x] https://github.com/elastic/elasticsearch/pull/88217
[x] https://github.com/elastic/elasticsearch/pull/88504
[x] Discuss/ finalize the right behavior for aggregations.
[x] https://github.com/elastic/elasticsearch/pull/88667

Follow-ups (can be done after initial merge):

elasticmachine commented 2 years ago

Pinging @elastic/es-search (Team:Search)

jtibshirani commented 2 years ago

@javanna brought up an interesting question that I'll post here for context. Before, the DFS phase on each shard was quite lightweight, as it just collects term and field statistics. Now we will also run a kNN search, which can have a higher latency. Is it possible that certain parts of the code assume DFS will be very fast?

I am not aware of assumptions like this. We make sure to fork the DFS phase to the search threadpool, just like we do for the query phase. Also calculating DFS involves parsing and rewriting the Query, then creating a Weight, which are not really guaranteed to be lightweight.

jpountz commented 2 years ago

When we discussed making DFS_QUERY_THEN_FETCH the default search type (https://github.com/elastic/elasticsearch/issues/20554), the point was made that the DFS phase might not always be cheap, e.g. maybe intersecting a fuzzy query with the terms dictionary is expensive, or the query has many terms, or many fields (which often is implicity, e.g. by default Elasticsearch searches all fieds). But we never proceeded with proper benchmarking to my knowledge.

jtibshirani commented 2 years ago

I'm going to close this out in favor of these two follow-ups:

elastic / elasticsearch

Integrate ANN into `_search` endpoint #87625

Proposed API

Implementation