Yelp / elastalert

Easy & Flexible Alerting With ElasticSearch
https://elastalert.readthedocs.org
Apache License 2.0
7.99k stars 1.74k forks source link

Delay in alerts #1365

Open goveebee opened 6 years ago

goveebee commented 6 years ago

I'm having some problems with a delay of my alerts. I see the data in Kibana but it still takes up to 5 mins before the alerts are triggered. The ElastAlert-query is run a couple of times with data in Elastic, but without triggering.

Rule:

name: More than 10 processes last 5 min

# (Required)
# Type of alert.
# the frequency rule type alerts when num_events events occur with timeframe time
type: frequency

# (Required)
# Index to search, wildcard supported
index: monitor

# (Required, frequency specific)
# Alert when this many documents matching the query occur within a timeframe
num_events: 500

# (Required, frequency specific)
# num_events must occur within this amount of time to trigger an alert
timeframe:
  minutes: 10
#  hours: 1

timestamp_field: "wleTime"

# (Required)
# A list of Elasticsearch filters used for find events
# These filters are joined with AND and nested in a filtered query
# For more info: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl.html
filter:
- query:
    query_string:
      query: "(activityName: Start AND TP) AND (_type: ActivitySummary)"

realert:
    minutes: 5

Config:

rules_folder: rules

# How often ElastAlert will query Elasticsearch
# The unit can be anything from weeks to seconds
run_every:
  seconds: 30
#  minutes: 1

# ElastAlert will buffer results from the most recent
# period of time, in case some log sources are not in real time
buffer_time:
  minutes: 15

# The Elasticsearch hostname for metadata writeback
# Note that every rule can have its own Elasticsearch host
es_host: elastic.xxx.xx

# The Elasticsearch port
es_port: 9200

# The AWS region to use. Set this when using AWS-managed elasticsearch
#aws_region: us-east-1

# The AWS profile to use. Use this if you are using an aws-cli profile.
# See http://docs.aws.amazon.com/cli/latest/userguide/cli-chap-getting-started.html
# for details
#profile: test

# Optional URL prefix for Elasticsearch
#es_url_prefix: elasticsearch

# Connect with TLS to Elasticsearch
#use_ssl: True

# Verify TLS certificates
#verify_certs: True

# GET request with body is the default option for Elasticsearch.
# If it fails for some reason, you can pass 'GET', 'POST' or 'source'.
# See http://elasticsearch-py.readthedocs.io/en/master/connection.html?highlight=send_get_body_as#transport
# for details
#es_send_get_body_as: GET

# Option basic-auth username and password for Elasticsearch
#es_username: someusername
#es_password: somepassword

# Use SSL authentication with client certificates client_cert must be
# a pem file containing both cert and key for client
#verify_certs: True
#ca_certs: /path/to/cacert.pem
#client_cert: /path/to/client_cert.pem
#client_key: /path/to/client_key.key

# The index on es_host which is used for metadata storage
# This can be a unmapped index, but it is recommended that you run
# elastalert-create-index to set a mapping
writeback_index: elastalert_status

# If an alert fails for some reason, ElastAlert will retry
# sending the alert until this time period has elapsed
alert_time_limit:
  days: 2
Qmando commented 6 years ago

You posted config.yaml twice instead of your rule.

goveebee commented 6 years ago

Edited. Now, both files are there.. :)

Qmando commented 6 years ago

ElastAlert will trigger an alert as soon as it sees 500 documents within the most recent 10 minutes. Perhaps when looking at Kibana you are seeing 500 documents over a slightly longer time period.

You should be able to see the exact amount of hits for each query being made in the logs and in elastalert_status index.

I don't think there is any bug or issue here unless you can show some data of a discrepancy.

goveebee commented 6 years ago

I'm sure the data I see is the data that Elastalert should trig on. Still it takes up to five minutes, no less than two. Even after multiple query runs.

Qmando commented 6 years ago

You can see from the logs exactly what time period is being queried. From that you should able to see where the discrepancy is.

You could post them here, otherwise there's not anything I can do to help.

A screenshot from the same time showing the relevant data in Kibana would help too.