Yelp / elastalert

Easy & Flexible Alerting With ElasticSearch
https://elastalert.readthedocs.org
Apache License 2.0
8k stars 1.73k forks source link

Unable to reason for aggregation alert #3056

Open lakshit07 opened 3 years ago

lakshit07 commented 3 years ago

I am using the aggregation rule to raise alerts if the number of events reaches 10 within any given minute. I also don't want to re-alert within 30 minutes. I have the following rule to this effect -

type: frequency
num_events: 10

timeframe:
  minutes: 1

realert:
  minutes: 30

query_key: [<something>]

aggregation:
  minutes: 1

aggregation_key: [<something>]

However, on one instance I noticed in the logs -

INFO:elastalert:Queried rule *** from 2020-12-01 03:56 UTC to 2020-12-01 03:57 UTC: 8 / 8 hits
INFO:elastalert:Ran *** from 2020-12-01 03:56 UTC to 2020-12-01 03:57 UTC: 8 query hits (0 already seen), 0 matches, 0 alerts sent

INFO:elastalert:Queried rule *** from 2020-12-01 03:57 UTC to 2020-12-01 03:58 UTC: 7 / 7 hits
INFO:elastalert:Queried rule *** from 2020-12-01 03:58 UTC to 2020-12-01 03:58 UTC: 0 / 0 hits
INFO:elastalert:New aggregation for ***, aggregation_key: *** . next alert at 2020-12-01 03:59:46.311910+00:00.
INFO:elastalert:Ran *** from 2020-12-01 03:57 UTC to 2020-12-01 03:58 UTC: 0 query hits (0 already seen), 1 matches, 0 alerts sent

INFO:elastalert:Queried rule *** from 2020-12-01 03:58 UTC to 2020-12-01 03:59 UTC: 17 / 17 hits
INFO:elastalert:Ignoring match for silenced rule ***.***

I have two questions -

lakshit07 commented 3 years ago

Hi @nsano-rururu , my question is not about the re alert. I understand why the 17 hits didn't trigger an alert. My question is - why did 8 + 7 hits across two different minute intervals trigger an alert?

daiwei233 commented 3 years ago

I am using the aggregation rule to raise alerts if the number of events reaches 10 within any given minute. I also don't want to re-alert within 30 minutes. I have the following rule to this effect -

type: frequency
num_events: 10

timeframe:
  minutes: 1

realert:
  minutes: 30

query_key: [<something>]

aggregation:
  minutes: 1

aggregation_key: [<something>]

However, on one instance I noticed in the logs -

INFO:elastalert:Queried rule *** from 2020-12-01 03:56 UTC to 2020-12-01 03:57 UTC: 8 / 8 hits
INFO:elastalert:Ran *** from 2020-12-01 03:56 UTC to 2020-12-01 03:57 UTC: 8 query hits (0 already seen), 0 matches, 0 alerts sent

INFO:elastalert:Queried rule *** from 2020-12-01 03:57 UTC to 2020-12-01 03:58 UTC: 7 / 7 hits
INFO:elastalert:Queried rule *** from 2020-12-01 03:58 UTC to 2020-12-01 03:58 UTC: 0 / 0 hits
INFO:elastalert:New aggregation for ***, aggregation_key: *** . next alert at 2020-12-01 03:59:46.311910+00:00.
INFO:elastalert:Ran *** from 2020-12-01 03:57 UTC to 2020-12-01 03:58 UTC: 0 query hits (0 already seen), 1 matches, 0 alerts sent

INFO:elastalert:Queried rule *** from 2020-12-01 03:58 UTC to 2020-12-01 03:59 UTC: 17 / 17 hits
INFO:elastalert:Ignoring match for silenced rule ***.***

I have two questions -

  • Why did the hits from 03:56-03:57 and 03:57-03:58 get clubbed together to trigger an alert?
  • Even if they did, why was the _numhits printed in the alert message 7 and not 15?

03:56-03:57 has no matches, hits means es query hits, so there is no alert.