`top_count_keys` is returning low-count terms

Yelp / elastalert

Easy & Flexible Alerting With ElasticSearch

Apache License 2.0

7.99k stars 1.74k forks source link

I'm having an issue where my config for a frequency rule is as below:

max_query_size: 10000

terms_size: 5
top_count_number: 5
top_count_keys:
  - "request"
  - "path"
  - "host"

and I'm not seeing the 'top' keys ordered by count descending in the alert text. Initially I had a problem with timeouts so the top_count_keys alert text would be empty, but now I've got a problem where it looks like it's not getting data for the 'top' keys. I've checked in Kibana and for fields like host I'm seeing up to 80 events per host. Below is a sample alert:

At least 150 events occurred between 2016-04-04 09:31 EDT and 2016-04-04 09:36 EDT

request.raw:
/items/ajax_item_comments/...: 1
/Receipt/OrderReceipt/...: 1
/bar.js: 1

host.raw:
front...: 3

path.raw:
/var/log/nginx/access_log: 3

Why would the alert by missing some data?

I'm wondering if you hit an edge case with the timing. Do you have 80 events per host between 9:31 and 9:36? Or are there a bunch of events immediately before 9:31. When using use_terms_query, all the events in the single query as marked as having occurred at the end timestamp.

Another possibility is that it's trying to filter on the wrong thing. For example, if you have query_key set to host, and there was a value "foo-bar" for host, then the terms query would actually return separate buckets for "foo" and "bar", because of string analysis. In that case, the query to get the terms will try to be smart and filter for "foo" but it will also use .raw.

Do you have query_key set to host or host.raw? You can also see the exact query being made if you add --es_debug_trace ~/file.log, that might be helpful.

Yelp / elastalert

`top_count_keys` is returning low-count terms #461