Open benwtrent opened 1 year ago
Pinging @elastic/es-search (Team:Search)
Some edge cases here are:
dense_vector
fields that are not indexed, this means they don't have a similarity_function defined. We need to allow users to specify their expected similarity function AND have a good default defined.dense_vector
. Should we allow this? It's technically possible as exact
will simply be iterating the vectors and calculating similarity.cc @benwtrent and @mayya-sharipova for validation
In case we do this before making knn search a query, these are the changes to the code that I've planned:
AbstractKnnVectorQuery
to add an exact
attribute to it for doing exact search.AbstractKnnVectorQuery.getLeafResults
to do an exact search in case exact
attribute is set, similar to the current exact searches that are done due to too many nodes visited or less than k possible matches: if (exact) {
Scorer scorer = filterWeight.scorer(ctx);
BitSet acceptDocs = createBitSet(scorer.iterator(), liveDocs, maxDoc);
return exactSearch(ctx, new BitSetIterator(acceptDocs, k));
}
KnnVectorQuery
, KnnByteVectorQuery
and KnnFloatVectorQuery
queries adding the exact
attribute. I'm thinking on using a Builder or a Parameter Object to avoid duplicating current constructors, thought that would involve changes in the callers as well.exact
query parameter and ensuring it performs an exact search. I'm thinking about checking that the proper method (exactSearch
) is invoked, but happy to hear other opinions on how to test this.KnnVectorQueryBuilder
to add an exact
parameterKnnVectorQueryBuilder.doToQuery
to pass along the exact
parameter to DenseVectorFieldMapper.createKnnQuery
DenseVectorFieldMapper.createKnnQuery
to create the Lucene KnnByteVectorQuery
or KnnFloatVectorQuery
with the Elasticsearch queryexact
parameterAbstractKnnVectorQueryBuilderTestCase
testsAfter discussing with @benwtrent , we'll tackle this issue after https://github.com/elastic/elasticsearch/issues/97940
This is no longer blocked - @liranabn for prioritisation.
We need to consider the case when vectors are not in an HNSW graph at all (e.g. "index: false"). We need to allow kNN queries and top level kNN to work there as well IMO. This may require some configuration from the user to indicate the similarity they want to use. Possibly, we just require the similarity to be set in the mapping and if they want to use custom similarity functions, they must switch back to script
.
I wonder if we should allow similarity
to be stored in the mapping configuration even when index: false
.
Proposal:
exact: true
to the KNN request.
indexed: true
indexed: false
this should probably error This turns this issue into a doable action item that provides value to our users, and defers scoping of some of the edge case questions surrounding flat indices to outside the scope of this issue.
CC: @liranabn @benwtrent @mayya-sharipova
From the above discussion, we will take the following steps for the scope of this work.
For the Lucene changes, we need a new public interface that all leaf readers can read.
Assigning the issue to @benwtrent for the lucene work. Once the lucene work is done, search relevance team can take the elasticsearch work.
Pinging @elastic/es-search-relevance (Team:Search Relevance)
Description
Currently the only way for users to do a brute-force or exact search with KNN is with a script query. This requires some knowledge of the function names and scoring methodologies.
We should provide a better interface for exact scan. One idea is to have an
exact: true
field withinkNN
.The name of the field is debatable. Or if we even update kNN at all. Maybe a new
kNN
query that allows for both exact and approximate within the query DSL?