Open igoristic opened 4 years ago
Pinging @elastic/stack-monitoring (Team:Monitoring)
Good stuff @igoristic!!
I found this super helpful comment by @polyfractal about how search.max_buckets
works, and I didn't realize it worked this way. It seems that we should be setting the shard_size
to config.get('monitoring.ui.max_bucket_size')
(as well as size
too, like we are) to ensure that we don't reject the request for searching too many buckets across a single shard. It makes me wonder why ES doesn't do this automatically - is there a reason why the shard_size
should be set to anything more than search.max_buckets
? Should ES do a Math.min(shardSizeCalculation, searchMaxBuckets)
when determining the actual shard_size
value? cc @polyfractal on this one
Before getting into details, there is discussion ongoing about deprecating or changing the behavior of max_buckets
, with the tentative plan leaning towards only counting buckets at the very end of reduction (e.g. so that it behaves purely as a bucket size limiter on the response), and potentially changing the default to a much higher value.
It seems that we should be setting the shard_size to config.get('monitoring.ui.max_bucket_size') (as well as size too, like we are) to ensure that we don't reject the request for searching too many buckets across a single shard.
++, I think that will help. But note that it is also not sufficient to "protect" the aggregation from tripping the threshold. E.g. you could have two aggs each collect searchMaxBuckets
(either one under the other, or next to each other) and still trip.
Theoretically the user would need to configure the agg so that the sum of all generated buckets is under the threshold. Sorta doable with terms
agg (e.g. in the above case, (10k * 1.5 + 10) * (2*1.5 + 10) == 195,130 buckets
), but impossible with something like date_histo
because you don't know how many buckets it will generate. That's the main driving force behind deprecating the setting since it's a pretty hard user experience :(
is there a reason why the shard_size should be set to anything more than search.max_buckets? Should ES do a Math.min(shardSizeCalculation, searchMaxBuckets) when determining the actual shard_size value
Yes, we should probably be doing that as an extra safety measure. I don't think there's a reason we don't do that... just an oversight when max_buckets
was added. It'll be moot if we change how the setting works, but if we don't we should definitely implement this.
@polyfractal
Great to know! I didn't know about https://github.com/elastic/elasticsearch/issues/51731 but we'll follow along now.
@igoristic
I think, for now, we can open a PR that adds shard_size
to all the places we are doing size: config.get('monitoring.ui.max_bucket_size')
and adjust once/if the changes happen on the ES side. WDYT?
Side note: in case the size
is ever something smaller than max_buckets
, you should probably fall back to the built-in heuristics. E.g. if the size is set to 2
, you don't want to collect and serialize max_buckets
to the coordinator :)
Thank you @polyfractal for the very insightful information 🙇
I think, for now, we can open a PR that adds shard_size to all the places we are doing size: config.get('monitoring.ui.max_bucket_size') and adjust once/if the changes happen on the ES side. WDYT?
++ @chrisronline I was thinking the same thing after I read the comment you mentioned.
Though, I also think we should try to pre-calculate size where possible (this can be a separate PR/issue), since in most cases we just need it to be the same as the count of the items we expect to get back (which should be significantly smaller than shard_size
or max_buckets
). We can use it as part of our min like you mentioned above:
Math.min(shardSizeCalculation, searchMaxBuckets, preCalculatedSize)
we should try to pre-calculate size where possible
Where do you see us being able to do this? I'm honestly blanking on a single place where we can leverage this. When we set the size: config.get('monitoring.ui.max_bucket_size')
, we do it because we don't know how many items there are.
@chrisronline
In the body of the post I gave a little example on how we can do this with indices for example. From our get_indices.js > getIndices
query we already know the total count, and can use it as our aggs size (which yields same results as size: 10000 would) in our
get_indices.js > getIndices
query.
But, I think this should be a separate "improvement" debt, and that shard_size
solution is good enough for the scope of this issue.
@chrisronline I've played around with shard_size
and size
a little, and I think it only masks the problem. This will still use up the same amount of memory, since .monitoring-es-*
indices only have one shard each. I was still able to reproduce the error and occasionally got ES to terminate with oom error. Maybe I'm missing something, but wouldn't the user be better off increasing their search.max_buckets
instead?
For all we know this might be happening because they have their collection set at a really high rate, or perhaps they increased the monitoring retention period to something beyond what our queries are optimized for.
I noticed we have metrics:max_buckets
in Management > Advanced Settings which is set to 2000 by default. I think we should also consider defaulting our monitoring max_bucket_size
to something similar. Doing this in conjuncture with shard_size
would really help us avoid the too many buckets error and also save some memory. And, if the user feels the metrics are "less" accurate they can always increase that number (along with search.max_buckets
) at their own discretion. WDYT?
I think we might be on different pages here unfortunately.
This will still use up the same amount of memory
When thinking this through, I wasn't really concerned about the memory overhead on the Elasticsearch nodes. I'm assuming search.max_buckets
is designed to handle this problem (and there are potential plans to deprecate this: https://github.com/elastic/elasticsearch/issues/51731). What I'm more interested in is something like this test:
PUT _cluster/settings
{
"transient": {
"search.max_buckets": 5
}
}
POST .monitoring-es-*/_search
{
"size": 0,
"sort": {
"timestamp": {
"order": "desc"
}
},
"query": {
"bool": {
"filter": [
{
"term": {
"type": "shards"
}
}
]
}
},
"aggs": {
"indices": {
"terms": {
"field": "shard.index",
"size": 5,
"shard_size": 5
}
}
}
}
If you remove shard_size
from the query, the query will end up failing due to too many buckets. Adding the shard_size
fixes this.
@chrisronline Sorry if I miss communicated
I think we're on the same page. My point was what do we do with more complex queries that might still trigger the max buckets error due to: calculations/buffering, or if we use max_bucket_size
multiple times, or date_histogram
queries? eg (will cause an error):
GET .monitoring-es-*/_search
{
"size": 0,
"sort": {
"timestamp": {
"order": "desc"
}
},
"query": {
"bool": {
"filter": [
{
"term": {
"type": "shards"
}
}
]
}
},
"aggs": {
"indices": {
"terms": {
"field": "shard.index",
"size": 5,
"shard_size": 5
},
"aggs": {
"by_date": {
"date_histogram": {
"field": "timestamp",
"min_doc_count": 1,
"fixed_interval": "30s"
}
}
}
}
}
}
Also, I think we should be conscious about memory/cpu footprints here as well, since I did eventually got my ES to crash with a OutOfMemoryError
error (testing it with several large .monitoring
indices and 10k max buckets). I guess if the cluster is under provisioned and you throw 10k buckets at it, there is nothing preventing it from using up all the memory to try to achieve those results (since the "too many buckets" breaker will never be tripped, which might cause OOM errors)
Yup, good point @igoristic
I'm not sure what to do here.
I don't know if there is a more appropriate size
to use (other than the one we are using now) where we can ensure this issue is fully resolved. The more and more sub-aggregations we have, the less we can realistically make the size
(and shard_size
) to the point where the data just isn't accurate.
Maybe we wait until there is resolution on https://github.com/elastic/elasticsearch/issues/51731. WDYT?
I don't know if there is a quick/short-term solution. I was thinking for now maybe we could implement the "safest" approach where we decrease the max_bucket_size
default to something like 2000 (like metrics has theirs) and also use shard_size
where possible.
Also, I don't know if it's common sense, but maybe our docs should also convey that if monitoring collection rate and/or monitoring retention is increased so should the search.max_buckets
or max_bucket_size
depending on what we go with
@igoristic can you schedule a 30 minute call for the 3 of us to go over this?
As an update from our side, someone on the analytics team is currently working on https://github.com/elastic/elasticsearch/issues/51731. Still no planned release (just got started), but we're hoping to resolve it sooner than later.
Would greatly appreciate your take on this @chrisronline
Most queries in Stack Monitoring backend use
config.get('monitoring.ui.max_bucket_size')
to set the.size
property in theaggs
. If this property is not set in the config it'll default to 10,000 which is also the cluster'ssearch.max_buckets
. This will causetoo_many_buckets_exception
error if it has enough data to trigger it. This approach works fine for when the aggregation is expected to yield a lot less data points (with a relatively small shard count per index ratio) than the impliedaggs.size
, but this will break once it's the other way around.I think we should avoid defaulting
max_bucket_size
tosearch.max_buckets
for aggregation size where possible. In some cases we can even calculate thesize
if we know approximately how many items we expect to get back. One example is with how we query shard stats for indices eg:Notice how
"field": "shard.index"
aggregation has 10000 as itssize
. Running this on a cluster that has a lot of indices (and shards) will result with the too many buckets error. But, changing the aggs size to something like:(item_count * 1.5 + 10)
while making sure it respectsshard_size
should be able to return the results with the same accuracy without triggering the error (if I understand this correctly).To test the case above you will need to simulate a pretty large cluster with lots of big indices (similar to this). You might also want to downplay your cluster's
search.max_buckets
setting (which should also be changed here)Then you need to obtain the
state_uuid
which is taken from our cluster status query:Be sure to change the
timestamp
andcluster_uuid
relevant to you environment. We then collect the list of indices from the following query:As you can see we can get the count from this query and then do our
aggs.size
calculation mentioned above.We can use a similar approach in other places we use max_bucket_size, and if we truly do need "everything" (and we don't know the count upfront) then we can start looking into composite queries again (revisiting https://github.com/elastic/kibana/issues/36358)