elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
1.5k stars 24.89k forks source link

Support exact search in a better way #97541

Open benwtrent opened 1 year ago

benwtrent commented 1 year ago

Description

Currently the only way for users to do a brute-force or exact search with KNN is with a script query. This requires some knowledge of the function names and scoring methodologies.

We should provide a better interface for exact scan. One idea is to have an exact: true field within kNN.

The name of the field is debatable. Or if we even update kNN at all. Maybe a new kNN query that allows for both exact and approximate within the query DSL?

elasticsearchmachine commented 1 year ago

Pinging @elastic/es-search (Team:Search)

benwtrent commented 1 year ago

Some edge cases here are:

carlosdelest commented 1 year ago

Task Refinement

cc @benwtrent and @mayya-sharipova for validation

Questions

Implementation plan

In case we do this before making knn search a query, these are the changes to the code that I've planned:

Lucene

    if (exact) {
      Scorer scorer = filterWeight.scorer(ctx);
      BitSet acceptDocs = createBitSet(scorer.iterator(), liveDocs, maxDoc);

      return exactSearch(ctx, new BitSetIterator(acceptDocs, k));
    }

Elasticsearch

carlosdelest commented 1 year ago

After discussing with @benwtrent , we'll tackle this issue after https://github.com/elastic/elasticsearch/issues/97940

carlosdelest commented 1 year ago

This is no longer blocked - @liranabn for prioritisation.

benwtrent commented 1 year ago

We need to consider the case when vectors are not in an HNSW graph at all (e.g. "index: false"). We need to allow kNN queries and top level kNN to work there as well IMO. This may require some configuration from the user to indicate the similarity they want to use. Possibly, we just require the similarity to be set in the mapping and if they want to use custom similarity functions, they must switch back to script.

I wonder if we should allow similarity to be stored in the mapping configuration even when index: false.

kderusso commented 1 year ago

Proposal:

This turns this issue into a doable action item that provides value to our users, and defers scoping of some of the edge case questions surrounding flat indices to outside the scope of this issue.

CC: @liranabn @benwtrent @mayya-sharipova

saikatsarkar056 commented 9 months ago

From the above discussion, we will take the following steps for the scope of this work.

saikatsarkar056 commented 9 months ago

For the Lucene changes, we need a new public interface that all leaf readers can read.

Assigning the issue to @benwtrent for the lucene work. Once the lucene work is done, search relevance team can take the elasticsearch work.

elasticsearchmachine commented 4 months ago

Pinging @elastic/es-search-relevance (Team:Search Relevance)