Yelp / elastalert

Easy & Flexible Alerting With ElasticSearch
https://elastalert.readthedocs.org
Apache License 2.0
7.99k stars 1.73k forks source link

fix compound query key in metric aggregation with bucket_interval #3161

Closed just1900 closed 1 year ago

just1900 commented 3 years ago

Fix metric aggregation while using compound query key and bucket_interval together

What happened here

  1. config rules with type metric_aggregation, compound query_key and bucket_interval like below.
    type: metric_aggregation
    query_key: ["prometheus.labels.instance","prometheus.labels.app"]
    bucket_interval:
    seconds: 30
  2. Debugging into the call stack. the payload_data param in function add_aggregation_data will contain compound bucket_aggskey looks like below https://github.com/Yelp/elastalert/blob/1dc4f30f30d39a689f419ce19c7e2e4d67a50be3/elastalert/ruletypes.py#L1000-L1007
{
    "bucket_aggs": {
        "doc_count_error_upper_bound": 0,
        "sum_other_doc_count": 0,
        "buckets": [
            {
                "key": "kafka",
                "doc_count": 32916,
                "bucket_aggs": {
                    "doc_count_error_upper_bound": 0,
                    "sum_other_doc_count": 0,
                    "buckets": [
                        {
                            "key": "stress-test-kafka",
                            "doc_count": 17628,
                            "interval_aggs": {
                                "buckets": [
                                    {
                                        "key_as_string": "2021-03-31T02:11:30.000Z",
                                        "key": 1617156690000,
                                        "doc_count": 1469,
                                        "metric_prometheus.metrics.kafka_controller_kafkacontroller_activecontrollercount_sum": {
                                            "value": 1
                                        }
                                    },
...(duplicated character removed)

then after indexing https://github.com/Yelp/elastalert/blob/1dc4f30f30d39a689f419ce19c7e2e4d67a50be3/elastalert/ruletypes.py#L1005 and unwarp_term_bucket https://github.com/Yelp/elastalert/blob/1dc4f30f30d39a689f419ce19c7e2e4d67a50be3/elastalert/ruletypes.py#L1014-L1019 the aggregation_data param in function check_matches looks like https://github.com/Yelp/elastalert/blob/1dc4f30f30d39a689f419ce19c7e2e4d67a50be3/elastalert/ruletypes.py#L1056-L1058

{
    "key": "kafka",
    "doc_count": 32916,
    "bucket_aggs": {
        "doc_count_error_upper_bound": 0,
        "sum_other_doc_count": 0,
        "buckets": [
            {
                "key": "stress-test-kafka",
                "doc_count": 17628,
                "interval_aggs": {
                    "buckets": [
                        {
                            "key_as_string": "2021-03-31T02:11:30.000Z",
                            "key": 1617156690000,
                            "doc_count": 1469,
                            "metric_prometheus.metrics.kafka_controller_kafkacontroller_activecontrollercount_sum": {
                                "value": 1
                            }
                        },
...(duplicated character removed)

What we could see is that the interval_aggs key is not unwrapped here. Also, the function check_matches_recursive only unwrap bucket_aggs recursively.
https://github.com/Yelp/elastalert/blob/1dc4f30f30d39a689f419ce19c7e2e4d67a50be3/elastalert/ruletypes.py#L1069-L1084

So if the bucket_interval is configured in the rules, and line 1127 will run into the KeyError while indexing self.metric_key.

How to Fix it