elastic / elasticsearch

Free and Open, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
68.5k stars 24.33k forks source link

terms queries performance degradation #109913

Open carlosdelest opened 1 week ago

carlosdelest commented 1 week ago

Elasticsearch Version

8.8 or higher

Installed Plugins

No response

Java Version

bundled

OS Version

Linux

Problem Description

With the introduction of PR https://github.com/apache/lucene/pull/12156 we saw degradation in performance of bool queries where one of the mandatory clauses is a TermInSetQuery with query terms not present in the field. Before for such cases TermsInSetQuery returned null for ScoreSupplier which would shortcut the whole bool query.

This has been fixed in Lucene with https://github.com/apache/lucene/pull/13454, but has not yet been included into Elasticsearch. This PR has not made it to Lucene 9.11.

We need to either incorporate that change as part of Lucene 9.12, or patch Elasticsearch so we include that until the Lucene change is included in ES.

Steps to Reproduce

Queries with a high number of terms in terms queries take a long time in build_scorer as reported by query profiling, specifically when those terms are not present in the field.

Logs (if relevant)

No response

elasticsearchmachine commented 1 week ago

Pinging @elastic/es-search (Team:Search)