Open garipovazamat opened 1 month ago
Pinging @elastic/es-search-relevance (Team:Search Relevance)
@garipovazamat first off let me just say wow congrats on working through upgrading to 8.x. I can still remember when I used 1.7.6.
I spent a bit of time digging in here to try to give you a better understanding of some of the underlying mechanics and options. And I plan to follow up with potential improvements where it makes sense.
Most of this discussion for now will come back to query formation. Because of how we are storing and retrieving the queries for percolators you do have to be a little more thoughtful with the benefit that the ES cluster is a lot more stable (and just in general you will likely notice stability improvements migrating from 1.7 to 8.15 across the board for instance when experimenting for this issue on 1.7.6 I had seg faults multiple times but with 8.x I haven't had any).
In my own tests I'm not seeing and don't expect to see improvements over some of the reported metrics you are getting particularly in the case you provided on 8.x. It's easy to see how you might misunderstand that your query can be optimized from the article you mentioned though. It is not a query optimization engine and so query optimization at least in this case falls to you right now. The optimizations are gained simply by using Lucene to optimize what's evaluated and short circuit where (ideally) possible, but it does take some knowledge of how that evaluation occurs and there are several limitations. The most intuitive understanding I can provide is that a simpler query is constructed and kept in memory for short circuiting but there are heuristics used to determine those covering queries. And while query optimization might seem like the easy expected outcome of this issue it might cause non-intuitive outcomes and so needs to be carefully considered. However, in 1.x the percolator queries were all entirely stored in memory and linearly evaluated. So as you add more queries the evaluation time goes up no matter whether the queries would have been short circuited or not (this may be counter to what you found but I'll show you the data I collected and we can discuss).
There's likely improvements here we can explore but I want to take that back for further discussion. Minimally I think the documentation might benefit from some more information about how these optimizations work, because though it is referenced to some extent here: query-dsl-percolate-query.html#how-it-works. I think it's pretty subtle, specifically the references to indexed term queries.
So then the problem is just how can we optimize these queries. There's a few options and let me show you some data around those for my own evaluations and then I'd be keen to see if any of these work for you.
flat1
queries (ms for each repeated query)
flat2
queries (ms for each repeated query)
should
and range
but most important should
which is evaluated from the bottom up cascading leaf nodesflat1
queries (ms for each repeated query)
flat2
queries (ms for each repeated query)
flat1
queries (ms for each repeated query)
flat2
queries (ms for each repeated query)
flat1
queries (ms for each repeated query)
flat2
queries (ms for each repeated query)
should
clause into multiple percolator queriesflat1
queries (ms for each repeated query)
flat2
queries (ms for each repeated query)
{ "query": { "bool": { "filter": { "terms": { "props.entity_obj.category": ["flat1", "flat3"] } }, "should": [ {"bool": { "must": [ {"term": {"props.entity_obj.category": "flat1"}}, {"range": {"props.price": {"gte": 1000}}}, {"range": {"props.price": {"gte": 1000}}}, {"range": {"props.price": {"gte": 1000}}}, {"range": {"props.price": {"gte": 1000}}}, {"range": {"props.price": {"gte": 1000}}}, {"range": {"props.price": {"gte": 1000}}} ] }}, {"bool": { "must": [ {"term": {"props.entity_obj.category": "flat3"}}, {"range": {"props.price": {"gte": 1000}}}, {"range": {"props.price": {"gte": 1000}}}, {"range": {"props.price": {"gte": 1000}}}, {"range": {"props.price": {"gte": 1000}}}, {"range": {"props.price": {"gte": 1000}}}, {"range": {"props.price": {"gte": 1000}}} ] }}, {"bool": { "must": [ {"term": {"props.entity_obj.category": "flat1"}}, {"term": {"props.entity_obj.category": "flat2"}}, {"range": {"props.price": {"gte": 1000}}}, {"range": {"props.price": {"gte": 1000}}}, {"range": {"props.price": {"gte": 1000}}}, {"range": {"props.price": {"gte": 1000}}}, {"range": {"props.price": {"gte": 1000}}}, {"range": {"props.price": {"gte": 1000}}} ] }} ] } } }
{ "query": { "bool": { "must": [ {"term": {"props.entity_obj.category": "flat1"}}, {"range": {"props.price": {"gte": 1000}}}, {"range": {"props.price": {"gte": 1000}}}, {"range": {"props.price": {"gte": 1000}}}, {"range": {"props.price": {"gte": 1000}}}, {"range": {"props.price": {"gte": 1000}}}, {"range": {"props.price": {"gte": 1000}}} ] } } }query 2:
{ "query": { "bool": { "must": [ {"term": {"props.entity_obj.category": "flat3"}}, {"range": {"props.price": {"gte": 1000}}}, {"range": {"props.price": {"gte": 1000}}}, {"range": {"props.price": {"gte": 1000}}}, {"range": {"props.price": {"gte": 1000}}}, {"range": {"props.price": {"gte": 1000}}}, {"range": {"props.price": {"gte": 1000}}} ] } } }
I took away several conclusions from these experiments.
For my experiments 1.7.6 was slower overall than 8.15, which I realize is in conflict with what you've reported. So could be worth looking into your cluster configurations for both to see if they line up appropriately. 8ms timings for 1.7.6 surprise me a bit knowing how the code is implemented. In my experiments I saw some overhead no matter what document was passed for evaluation in 1.7.6, which lines up with how each percolator query has to be fully evaluated against a document that's passed to ES.
When using 8.15 (or any distributed database) I would suggest trying to pre-join your percolator queries prior to indexing them; it's essentially the same concept for documents when inserting them into ES (or most distributed stores) in that we want to denormalize the inserted data to prescribe optimizations to ES.
There's also some pain here with intuiting how queries will perform specifically with heavily nested boolean queries with should
clauses and range
. The should
clauses are preventing the optimizations you are expecting so if we can rewrite the queries to compensate for this you should see query performance improvements. I took two approaches to this to see if that would work for you and also as a bootstrap for additional internal discussions about percolator queries.
Both Exp 4 and Exp 5 are examples of taking advantage of the term query optimization (mentioned introduced in 5.x) and more importantly avoiding some pitfalls of how should
is handled under the hood.
Specifically Exp 4 tries to keep all of the query conditions from your examples including after joining some conditions that don't make sense like multiple categories. Because of the should
condition this by itself is not sufficient and requires a filter
condition.
Really introducing a filter
or a must_not
at the top level will generally have the effect you are looking for without introducing additional metadata outside of the percolator query particularly if those top level filter
conditions are term queries, which are going to be extremely fast and always part of the query short circuiting path.
Exp 5 takes this a step further and attempts to index more than the original 10k percolator queries but simplifies each individual percolator query effectively removing the problematic should
and more importantly bypassing the need to introduce a filter
clause. I found in my experiments that this was overall the best route but I didn't test with large numbers of percolator queries or real production data so your mileage may vary here vs the approach in Exp 4.
Adding @martijnvg as well as he might be able to add more color or correct me on anything here as I believe he did much of the work on percolators in 5.x and since.
@john-wagster Thank you for your response and for taking the time to look at my experiments!
The example I provided above was just an attempt to understand if there is a pre-select optimization. Our production case is more complex, and I didn't want to overload the issue by attaching it right away. In our case, we cannot simply set one filter in the filters block. I have attached the data more similar to real data in production below, but I don't understand how improvements you suggested can be applied for this case. I assumed that the optimization is implemented in a similar way to what is described in this article https://martin.kleppmann.com/2015/04/13/real-time-full-text-search-luwak-samza.html, since Luwak is now part of Lucene, and Elasticsearch is based on Lucene.
I also double-checked the percolator for ES1, and it indeed performs very quickly and correctly returns the percolation results. The percolation time for flat1
is ~30ms (when searches match the document), and for flat2
, it is ~3ms (when searches do not match).
I tested it locally through a Docker container. I ran it like this: docker run -d --name elastic-custom -p 9200:9200 -p 9300:9300 --rm elasticsearch:1.7.6
). I don't understand why it is so slow for you. I had about ~8GB free RAM at the time of launch (a total of 16GB, DDR4) and an Intel Core i3 8350k processor, running on Ubuntu 22.04.
ES8 I ran also by docker, command: docker run -d --name elastic-custom -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" -e "xpack.security.enabled=false" -e "ES_JAVA_OPTS=-Xms1G -Xmx1G -XX:ActiveProcessorCount=4" -e "bootstrap.memory_lock=true" --rm elasticsearch:8.15.2
For clarity, I conducted another experiment. I compared the percolation time of ES1 and ES8 on data that is more similar to production data. I provide data for the experiment that is more similar to real production data.
Here are the settings for ES1:
I duplicated query 10 000 times and percolate the same document 9 times in both elasticsearch versions.
Results for ES8 Results in millisecond when the doc match all searches:
646.0, 544.0, 540.0, 545.0, 553.0, 582.0, 584.0, 584., 578.0
mean: 572.8888888888889
Results in millisecond when the doc do not fall within the bbox:
470.0, 475.0, 481.0, 465.0, 480.0, 462.0, 461.0, 463.0, 461.0
mean: 468.6666666666667
Results for ES1 Results in ms when the doc match all searches:
46.0, 62.0, 34.0, 29.0, 25.0, 28.0, 31.0, 23.0, 23.0
mean: 33.44444444444444
Results in ms when the doc do not fall within the bbox:
29.0, 38.0, 31.0, 19.0, 20.0, 18.0, 15.0, 17.0, 17.0
mean: 22.666666666666668
In these experiments, I observe the same difference; ES1 is approximately 17 times faster (for cases when searches match) and 21 times faster (when searches do not match) on the same searches. Additionally, I cannot understand how I could apply the optimizations you suggested to this data. Simply moving all filters to the filter block does not yield any improvements either. Perhaps the issue is not with the pre-select optimization at all, but rather a more general problem.
In production, I see a similar picture. With approximately the same cluster cost, the percolation speed in ES8 is about 10 to 15 times lower. I cannot find the exact reasons for this downgrade. The cluster settings for ES8 in production are as follows:
{
"settings": {
"index": {
"routing": {"allocation": {"include": {"_tier_preference": "data_content"}}},
"refresh_interval": "5s",
"number_of_shards": "6",
"number_of_replicas": "1"
}
}
}
There are no indexing errors for the queries. The query {"query": {"term" : {"query.extraction_result" : "failed"}}}
returns an empty response. Profiling percolate queries does not provide any valuable information. The queries in ES8 and ES1 are analogous, and the data volume is the same (~5 million searches in the percolator and ~2 documents per second are percolated).
I also want to clarify, what you mean by "pre-join queries of the percolator"?
Perhaps the optimizations you suggested can be applied to my queries, but it is not obvious to me how they work. Maybe you can tell me where I can read about them? If they are not applicable, I would like to understand if you can do something about it?
@garipovazamat could you try specifically setting minimum_should_match
in the boolean clauses where should
clauses exist? In filter contexts, we no longer set the minimum_should_match
to 1
, instead those should clauses are dropped.
Percolator is still executing a pre-filter and optimization check. But, we might be dropping those should clauses on the floor.
@benwtrent Do you mean try this query?
I have just tried it, nothing changed. The percolation time is the same.
@garipovazamat is the key difference you have found in speed related directly to geo bounding box? Or is it ANY of the filters?
@benwtrent I think ANY of the filters. If I remove geo bounding box filter, it become not much faster.
@garipovazamat sorry for the slow response. Give me a bit more time here and I'll take a look at your queries and see if I can optimize them; I think it's probably a good exercise that we can reflect back into the docs as my suspicion here is it is possible (yep I'm an optimist) to optimize these, does require some effort, and is a little non-intuitive.
I do think Ben's question around the geo bounding box will be relevant too. I know previously there was an attempt to add optimization around how geo percolator queries were handled but it was abandoned. I'll dig into that some more but using geo as a bounding box might be problematic.
We have been trying to migrate from Elasticsearch version 1.7.6 to the latest version (8.15) in our company and discovered that the latest version has become much slower. To find the reason for this degradation, I conducted several experiments. During these experiments, I found that some claimed improvements likely do not work as expected.
Experiment details
I created the following index mapping:
I filled index with 10 000 duplicated queries, which contain only
must
,should
,term
andrange
conditionsYou can see, that there are two main conditions inside
must
: the first is simple, and the second a bit more complex. Logically, there is no reason to check the second condition if the first one isfalse
. However, my experiments showed that if the first condition isfalse
for a document, adding conditions insideshould
(the second condition) increases the percolation time. Therefore, I conclude that the improvements claimed in this article https://www.elastic.co/blog/elasticsearch-percolator-continues-to-evolve do not work.I ran the percolation with the following request:
As a result, I got the following percolation time with one document: ~0.157 seconds. I conducted a similar experiment on Elasticsearch version 1.7.6 with identical data, and the result was: ~0.008 seconds, which is ! ~20x faster.
We also tried percolating with real production data. The only improvement we saw was when we added additional filters with the percolate query by using metadata, which we extracted from the primary query. For example, we took the query mentioned above and added metadata (
meta_data.category
field).Then I sent the following request:
But this approach has a disadvantage. It becomes more difficult to percolate a large batch. If I need to percolate many documents, I have to separate them by the
category
field, resulting in smaller batches. This negates the improvement of percolating many documents in one query. I also tried using named percolation (thename
field in the percolate query) and made a query with a few percolate queries inside (one for each category), but this approach did not have any advantage compared to separate requests (the percolation time was the same). In general, extracting metadata and adding additional filters for this metadata seems like unnecessary work, forcing us to maintain those extra filters. It seems that the search engine should handle such optimizations itself. I suspect this is the "pre-selecting" feature.Python scripts for experiments (python 3.12): scripts
Conclusion
Currently, percolation with queries, even simple filters, performs significantly slower than in older versions of Elasticsearch. It seems likely that the latest version lacks the pre-selecting optimization, or it is not functioning correctly. Alternatively, I might have missed something, and it can be enabled. Either way, we cannot migrate to the current version of Elasticsearch due to this performance issue. I would appreciate any help you can provide to resolve this problem.