Open ioanatia opened 1 year ago
Pinging @elastic/es-search (Team:Search)
I believe this feature is something that @russcam and team may also be interested in.
Absolutely on board with adding "exists" query support for the "rank_features" field! I noticed the issue talks about the inference pipeline not working as expected. Got me thinking – maybe there's a more effective way to spot those partial documents? How about relying on an error field? This could help us catch pipeline hiccups while still getting the document indexed. Just a thought!
There are two things here for "exists" on "rank_features"
I think @russcam and co. want "exists on a rank_features feature", not just on the field itself.
That's correct @benwtrent, the latter is what we would be interested in
Quick update here, while we don't support exists
for a specific feature, the underlying logic would probably be the same as a term
query as we cannot infer if a feature exists or not without either:
@russcam I know this is "late to the party", but to check if a feature "exists" or not, you can do a term
query against it. So, you can then filter the query against a particular feature and then score via the rank_features if you wish.
So, to find feature bar
in rank_features foo
you could do
"query": {
"term": {
"foo": "bar"
}
}
This will filter for all docs where rank_features foo
has the particular feature bar
Pinging @elastic/es-search-relevance (Team:Search Relevance)
Description
When using a ML model with that outputs sparse vectors (like ELSER), it's possible that some times the inference pipeline fails when indexing new documents and new ingested documents are not enriched with the
rank_features
fields.One approach in this case is to run an update by query and issue a reindexing with the ml pipeline of the documents in place in the same index. In this case, we would want to only update the documents that do not have
rank_features
fields, for example:In theory this a much simpler approach then duplicating all the data to reindex in another index or to reissue bulk indexing requests from scratch. It only requires a single Elasticsearch API request.
the problem is that
rank_features
fields do not supportexists
queries.This also makes it difficult to get an accurate count of how many fields are missing the
rank_features
fields and users have to rely on other fields/mechanisms to do a reindexing in place with update by query.