[Monitoring] Calling aggs queries with max_bucket_size can throw too_many_buckets_exception

igoristic commented 4 years ago

Would greatly appreciate your take on this @chrisronline

Most queries in Stack Monitoring backend use config.get('monitoring.ui.max_bucket_size') to set the .size property in the aggs. If this property is not set in the config it'll default to 10,000 which is also the cluster's search.max_buckets. This will cause too_many_buckets_exception error if it has enough data to trigger it. This approach works fine for when the aggregation is expected to yield a lot less data points (with a relatively small shard count per index ratio) than the implied aggs.size, but this will break once it's the other way around.

I think we should avoid defaulting max_bucket_size to search.max_buckets for aggregation size where possible. In some cases we can even calculate the size if we know approximately how many items we expect to get back. One example is with how we query shard stats for indices eg:

# get_indices_unassigned_shard_stats.js > getUnassignedShardData

GET .monitoring-es-6-*,.monitoring-es-7-*,monitoring-es-7-*,monitoring-es-8-*/_search
{
  "sort": {
    "timestamp": {
      "order": "desc"
    }
  },
  "size": 0,
  "query": {
    "bool": {
      "filter": [
        {
          "term": {
            "type": "shards"
          }
        },
        {
          "term": {
            "cluster_uuid": "rVFrm40kS5m9IwNJ3qFVJw"
          }
        },
        {
          "term": {
            "state_uuid": "wyzh7AYTT9qt0ARh6EhGNg"
          }
        }
      ]
    }
  },
  "aggs": {
    "indices": {
      "terms": {
        "field": "shard.index",
        "size": 10000
      },
      "aggs": {
        "state": {
          "filter": {
            "terms": {
              "shard.state": [
                "UNASSIGNED",
                "INITIALIZING"
              ]
            }
          },
          "aggs": {
            "primary": {
              "terms": {
                "field": "shard.primary",
                "size": 2
              }
            }
          }
        }
      }
    }
  }
}

Notice how "field": "shard.index" aggregation has 10000 as its size. Running this on a cluster that has a lot of indices (and shards) will result with the too many buckets error. But, changing the aggs size to something like: (item_count * 1.5 + 10) while making sure it respects shard_size should be able to return the results with the same accuracy without triggering the error (if I understand this correctly).

To test the case above you will need to simulate a pretty large cluster with lots of big indices (similar to this). You might also want to downplay your cluster's search.max_buckets setting (which should also be changed here)

Then you need to obtain the state_uuid which is taken from our cluster status query:

# get_clusters_stats.js > fetchClusterStats

GET .monitoring-es-6-*,.monitoring-es-7-*,monitoring-es-7-*,monitoring-es-8-*/_search
{
  "query": {
    "bool": {
      "filter": [
        {
          "term": {
            "type": "cluster_stats"
          }
        },
        {
          "term": {
            "cluster_uuid": "rVFrm40kS5m9IwNJ3qFVJw"
          }
        },
        {
          "range": {
            "timestamp": {
              "format": "epoch_millis",
              "gte": 1583886600000,
              "lte": 1583890200000
            }
          }
        }
      ]
    }
  },
  "collapse": {
    "field": "cluster_uuid"
  },
  "sort": {
    "timestamp": {
      "order": "desc"
    }
  }
}

Be sure to change the timestamp and cluster_uuid relevant to you environment. We then collect the list of indices from the following query:

# get_indices.js > getIndices

GET .monitoring-es-6-*,.monitoring-es-7-*,monitoring-es-7-*,monitoring-es-8-*/_search
{
  "query": {
    "bool": {
      "filter": [
        {
          "term": {
            "type": "index_stats"
          }
        },
        {
          "term": {
            "cluster_uuid": "rVFrm40kS5m9IwNJ3qFVJw"
          }
        },
        {
          "range": {
            "timestamp": {
              "format": "epoch_millis",
              "gte": 1583886600000,
              "lte": 1583890200000
            }
          }
        }
      ]
    }
  },
  "collapse": {
    "field": "index_stats.index",
    "inner_hits": {
      "name": "earliest",
      "size": 1,
      "sort": [
        {
          "timestamp": "asc"
        }
      ]
    }
  },
  "sort": [
    {
      "timestamp": {
        "order": "desc"
      }
    }
  ]
}

As you can see we can get the count from this query and then do our aggs.size calculation mentioned above.

We can use a similar approach in other places we use max_bucket_size, and if we truly do need "everything" (and we don't know the count upfront) then we can start looking into composite queries again (revisiting https://github.com/elastic/kibana/issues/36358)

elasticmachine commented 4 years ago

Pinging @elastic/stack-monitoring (Team:Monitoring)

chrisronline commented 4 years ago

Good stuff @igoristic!!

I found this super helpful comment by @polyfractal about how search.max_buckets works, and I didn't realize it worked this way. It seems that we should be setting the shard_size to config.get('monitoring.ui.max_bucket_size') (as well as size too, like we are) to ensure that we don't reject the request for searching too many buckets across a single shard. It makes me wonder why ES doesn't do this automatically - is there a reason why the shard_size should be set to anything more than search.max_buckets? Should ES do a Math.min(shardSizeCalculation, searchMaxBuckets) when determining the actual shard_size value? cc @polyfractal on this one

polyfractal commented 4 years ago

Before getting into details, there is discussion ongoing about deprecating or changing the behavior of max_buckets, with the tentative plan leaning towards only counting buckets at the very end of reduction (e.g. so that it behaves purely as a bucket size limiter on the response), and potentially changing the default to a much higher value.

It seems that we should be setting the shard_size to config.get('monitoring.ui.max_bucket_size') (as well as size too, like we are) to ensure that we don't reject the request for searching too many buckets across a single shard.

++, I think that will help. But note that it is also not sufficient to "protect" the aggregation from tripping the threshold. E.g. you could have two aggs each collect searchMaxBuckets (either one under the other, or next to each other) and still trip.

Theoretically the user would need to configure the agg so that the sum of all generated buckets is under the threshold. Sorta doable with terms agg (e.g. in the above case, (10k * 1.5 + 10) * (2*1.5 + 10) == 195,130 buckets), but impossible with something like date_histo because you don't know how many buckets it will generate. That's the main driving force behind deprecating the setting since it's a pretty hard user experience :(

is there a reason why the shard_size should be set to anything more than search.max_buckets? Should ES do a Math.min(shardSizeCalculation, searchMaxBuckets) when determining the actual shard_size value

Yes, we should probably be doing that as an extra safety measure. I don't think there's a reason we don't do that... just an oversight when max_buckets was added. It'll be moot if we change how the setting works, but if we don't we should definitely implement this.

chrisronline commented 4 years ago

@polyfractal

Great to know! I didn't know about https://github.com/elastic/elasticsearch/issues/51731 but we'll follow along now.

@igoristic

I think, for now, we can open a PR that adds shard_size to all the places we are doing size: config.get('monitoring.ui.max_bucket_size') and adjust once/if the changes happen on the ES side. WDYT?

polyfractal commented 4 years ago

Side note: in case the size is ever something smaller than max_buckets, you should probably fall back to the built-in heuristics. E.g. if the size is set to 2, you don't want to collect and serialize max_buckets to the coordinator :)

igoristic commented 4 years ago

Thank you @polyfractal for the very insightful information 🙇

I think, for now, we can open a PR that adds shard_size to all the places we are doing size: config.get('monitoring.ui.max_bucket_size') and adjust once/if the changes happen on the ES side. WDYT?

++ @chrisronline I was thinking the same thing after I read the comment you mentioned.

Though, I also think we should try to pre-calculate size where possible (this can be a separate PR/issue), since in most cases we just need it to be the same as the count of the items we expect to get back (which should be significantly smaller than shard_size or max_buckets). We can use it as part of our min like you mentioned above: Math.min(shardSizeCalculation, searchMaxBuckets, preCalculatedSize)

chrisronline commented 4 years ago

we should try to pre-calculate size where possible

Where do you see us being able to do this? I'm honestly blanking on a single place where we can leverage this. When we set the size: config.get('monitoring.ui.max_bucket_size'), we do it because we don't know how many items there are.

igoristic commented 4 years ago

@chrisronline

In the body of the post I gave a little example on how we can do this with indices for example. From our get_indices.js > getIndices query we already know the total count, and can use it as our aggs size (which yields same results as size: 10000 would) in our get_indices.js > getIndices query.

But, I think this should be a separate "improvement" debt, and that shard_size solution is good enough for the scope of this issue.

igoristic commented 4 years ago

@chrisronline I've played around with shard_size and size a little, and I think it only masks the problem. This will still use up the same amount of memory, since .monitoring-es-* indices only have one shard each. I was still able to reproduce the error and occasionally got ES to terminate with oom error. Maybe I'm missing something, but wouldn't the user be better off increasing their search.max_buckets instead?

For all we know this might be happening because they have their collection set at a really high rate, or perhaps they increased the monitoring retention period to something beyond what our queries are optimized for.

I noticed we have metrics:max_buckets in Management > Advanced Settings which is set to 2000 by default. I think we should also consider defaulting our monitoring max_bucket_size to something similar. Doing this in conjuncture with shard_size would really help us avoid the too many buckets error and also save some memory. And, if the user feels the metrics are "less" accurate they can always increase that number (along with search.max_buckets) at their own discretion. WDYT?

chrisronline commented 4 years ago

I think we might be on different pages here unfortunately.

This will still use up the same amount of memory

When thinking this through, I wasn't really concerned about the memory overhead on the Elasticsearch nodes. I'm assuming search.max_buckets is designed to handle this problem (and there are potential plans to deprecate this: https://github.com/elastic/elasticsearch/issues/51731). What I'm more interested in is something like this test:

PUT _cluster/settings
{
  "transient": {
    "search.max_buckets": 5
  }
}
POST .monitoring-es-*/_search
{
  "size": 0,
  "sort": {
    "timestamp": {
      "order": "desc"
    }
  },
  "query": {
    "bool": {
      "filter": [
        {
          "term": {
            "type": "shards"
          }
        }
      ]
    }
  },
  "aggs": {
    "indices": {
      "terms": {
        "field": "shard.index",
        "size": 5,
        "shard_size": 5
      }
    }
  }
}

If you remove shard_size from the query, the query will end up failing due to too many buckets. Adding the shard_size fixes this.

igoristic commented 4 years ago

@chrisronline Sorry if I miss communicated

I think we're on the same page. My point was what do we do with more complex queries that might still trigger the max buckets error due to: calculations/buffering, or if we use max_bucket_size multiple times, or date_histogram queries? eg (will cause an error):

GET .monitoring-es-*/_search
{
  "size": 0,
  "sort": {
    "timestamp": {
      "order": "desc"
    }
  },
  "query": {
    "bool": {
      "filter": [
        {
          "term": {
            "type": "shards"
          }
        }
      ]
    }
  },
  "aggs": {
    "indices": {
      "terms": {
        "field": "shard.index",
        "size": 5,
        "shard_size": 5
      },
      "aggs": {
        "by_date": {
          "date_histogram": {
            "field": "timestamp",
            "min_doc_count": 1,
            "fixed_interval": "30s"
          }
        }
      }
    }
  }
}

Also, I think we should be conscious about memory/cpu footprints here as well, since I did eventually got my ES to crash with a OutOfMemoryError error (testing it with several large .monitoring indices and 10k max buckets). I guess if the cluster is under provisioned and you throw 10k buckets at it, there is nothing preventing it from using up all the memory to try to achieve those results (since the "too many buckets" breaker will never be tripped, which might cause OOM errors)

chrisronline commented 4 years ago

Yup, good point @igoristic

I'm not sure what to do here.

I don't know if there is a more appropriate size to use (other than the one we are using now) where we can ensure this issue is fully resolved. The more and more sub-aggregations we have, the less we can realistically make the size (and shard_size) to the point where the data just isn't accurate.

Maybe we wait until there is resolution on https://github.com/elastic/elasticsearch/issues/51731. WDYT?

igoristic commented 4 years ago

I don't know if there is a quick/short-term solution. I was thinking for now maybe we could implement the "safest" approach where we decrease the max_bucket_size default to something like 2000 (like metrics has theirs) and also use shard_size where possible.

Also, I don't know if it's common sense, but maybe our docs should also convey that if monitoring collection rate and/or monitoring retention is increased so should the search.max_buckets or max_bucket_size depending on what we go with

jasonrhodes commented 4 years ago

@igoristic can you schedule a 30 minute call for the 3 of us to go over this?

polyfractal commented 4 years ago

As an update from our side, someone on the analytics team is currently working on https://github.com/elastic/elasticsearch/issues/51731. Still no planned release (just got started), but we're hoping to resolve it sooner than later.

elastic / kibana

[Monitoring] Calling aggs queries with max_bucket_size can throw too_many_buckets_exception #59983