Yelp / elastalert

Easy & Flexible Alerting With ElasticSearch
https://elastalert.readthedocs.org
Apache License 2.0
7.97k stars 1.74k forks source link

--patience flag and disable_rules_on_error doesn't seem to work #1926

Open chrislujan opened 5 years ago

chrislujan commented 5 years ago

version:

$ sudo pip list | grep elastalert
elastalert                       0.1.33

I have a functioning config but I would like elastalert to be patient as elasticsearch sometimes takes a minute to come up after a restart. I have tried using --patience with no luck. I'm not sure what I'm doing wrong. I have also noticed that if ES is not responding, rules are disabled, regardless of my use of disable_rules_on_error (see 5th warning of log). Is it possible that my configurations aren't being applied?

Here's my config:

rules_folder: /etc/elastalert/rules

run_every:
  minutes: 1

buffer_time:
  minutes: 15

es_host: 127.0.0.1

es_port: 9201

writeback_index: elastalert

alert_time_limit:
  hours: 3

disable_rules_on_error: false

Here's my command and output:

$ time sudo /usr/bin/python2 /usr/bin/elastalert --verbose --patience minutes=5 --config config.yaml
WARNING:elasticsearch:GET http://127.0.0.1:9201/ [status:N/A request:0.002s]
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/elasticsearch/connection/http_requests.py", line 76, in perform_request
    response = self.session.send(prepared_request, **send_kwargs)
  File "/usr/lib/python2.7/site-packages/requests/sessions.py", line 622, in send
    r = adapter.send(request, **kwargs)
  File "/usr/lib/python2.7/site-packages/requests/adapters.py", line 513, in send
    raise ConnectionError(e, request=request)
ConnectionError: HTTPConnectionPool(host='127.0.0.1', port=9201): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x2a78590>: Failed to establish a new connection: [Errno 111] Connection refused',))
WARNING:elasticsearch:GET http://127.0.0.1:9201/ [status:N/A request:0.003s]
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/elasticsearch/connection/http_requests.py", line 76, in perform_request
    response = self.session.send(prepared_request, **send_kwargs)
  File "/usr/lib/python2.7/site-packages/requests/sessions.py", line 622, in send
    r = adapter.send(request, **kwargs)
  File "/usr/lib/python2.7/site-packages/requests/adapters.py", line 513, in send
    raise ConnectionError(e, request=request)
ConnectionError: HTTPConnectionPool(host='127.0.0.1', port=9201): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x2b08950>: Failed to establish a new connection: [Errno 111] Connection refused',))
WARNING:elasticsearch:GET http://127.0.0.1:9201/ [status:N/A request:0.003s]
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/elasticsearch/connection/http_requests.py", line 76, in perform_request
    response = self.session.send(prepared_request, **send_kwargs)
  File "/usr/lib/python2.7/site-packages/requests/sessions.py", line 622, in send
    r = adapter.send(request, **kwargs)
  File "/usr/lib/python2.7/site-packages/requests/adapters.py", line 513, in send
    raise ConnectionError(e, request=request)
ConnectionError: HTTPConnectionPool(host='127.0.0.1', port=9201): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x2b08bd0>: Failed to establish a new connection: [Errno 111] Connection refused',))
WARNING:elasticsearch:GET http://127.0.0.1:9201/ [status:N/A request:0.003s]
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/elasticsearch/connection/http_requests.py", line 76, in perform_request
    response = self.session.send(prepared_request, **send_kwargs)
  File "/usr/lib/python2.7/site-packages/requests/sessions.py", line 622, in send
    r = adapter.send(request, **kwargs)
  File "/usr/lib/python2.7/site-packages/requests/adapters.py", line 513, in send
    raise ConnectionError(e, request=request)
ConnectionError: HTTPConnectionPool(host='127.0.0.1', port=9201): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x2b08b10>: Failed to establish a new connection: [Errno 111] Connection refused',))
WARNING:elastalert:Error connecting to Elasticsearch for rule Sensitive Action. The rule has been disabled.
ERROR:root:Error connecting to SMTP host: {}
WARNING:elasticsearch:GET http://127.0.0.1:9201/ [status:N/A request:0.002s]
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/elasticsearch/connection/http_requests.py", line 76, in perform_request
    response = self.session.send(prepared_request, **send_kwargs)
  File "/usr/lib/python2.7/site-packages/requests/sessions.py", line 622, in send
    r = adapter.send(request, **kwargs)
  File "/usr/lib/python2.7/site-packages/requests/adapters.py", line 513, in send
    raise ConnectionError(e, request=request)
ConnectionError: HTTPConnectionPool(host='127.0.0.1', port=9201): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x2bbc7d0>: Failed to establish a new connection: [Errno 111] Connection refused',))
WARNING:elasticsearch:GET http://127.0.0.1:9201/ [status:N/A request:0.003s]
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/elasticsearch/connection/http_requests.py", line 76, in perform_request
    response = self.session.send(prepared_request, **send_kwargs)
  File "/usr/lib/python2.7/site-packages/requests/sessions.py", line 622, in send
    r = adapter.send(request, **kwargs)
  File "/usr/lib/python2.7/site-packages/requests/adapters.py", line 513, in send
    raise ConnectionError(e, request=request)
ConnectionError: HTTPConnectionPool(host='127.0.0.1', port=9201): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x2bbc9d0>: Failed to establish a new connection: [Errno 111] Connection refused',))
WARNING:elasticsearch:GET http://127.0.0.1:9201/ [status:N/A request:0.003s]
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/elasticsearch/connection/http_requests.py", line 76, in perform_request
    response = self.session.send(prepared_request, **send_kwargs)
  File "/usr/lib/python2.7/site-packages/requests/sessions.py", line 622, in send
    r = adapter.send(request, **kwargs)
  File "/usr/lib/python2.7/site-packages/requests/adapters.py", line 513, in send
    raise ConnectionError(e, request=request)
ConnectionError: HTTPConnectionPool(host='127.0.0.1', port=9201): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x2bbc750>: Failed to establish a new connection: [Errno 111] Connection refused',))
WARNING:elasticsearch:GET http://127.0.0.1:9201/ [status:N/A request:0.003s]
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/elasticsearch/connection/http_requests.py", line 76, in perform_request
    response = self.session.send(prepared_request, **send_kwargs)
  File "/usr/lib/python2.7/site-packages/requests/sessions.py", line 622, in send
    r = adapter.send(request, **kwargs)
  File "/usr/lib/python2.7/site-packages/requests/adapters.py", line 513, in send
    raise ConnectionError(e, request=request)
ConnectionError: HTTPConnectionPool(host='127.0.0.1', port=9201): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x2bbc510>: Failed to establish a new connection: [Errno 111] Connection refused',))
Traceback (most recent call last):
  File "/usr/bin/elastalert", line 11, in <module>
    load_entry_point('elastalert==0.1.33', 'console_scripts', 'elastalert')()
  File "/usr/lib/python2.7/site-packages/elastalert/elastalert.py", line 1897, in main
    client = ElastAlerter(args)
  File "/usr/lib/python2.7/site-packages/elastalert/elastalert.py", line 161, in __init__
    if not self.init_rule(rule):
  File "/usr/lib/python2.7/site-packages/elastalert/elastalert.py", line 929, in init_rule
    self.send_notification_email(exception=e, rule=new_rule)
  File "/usr/lib/python2.7/site-packages/elastalert/elastalert.py", line 1830, in send_notification_email
    self.handle_error('Error connecting to SMTP host: %s' % (e), {'email_body': email_body})
  File "/usr/lib/python2.7/site-packages/elastalert/elastalert.py", line 1781, in handle_error
    self.writeback('elastalert_error', body)
  File "/usr/lib/python2.7/site-packages/elastalert/elastalert.py", line 1451, in writeback
    if(self.is_atleastsix()):
  File "/usr/lib/python2.7/site-packages/elastalert/elastalert.py", line 182, in is_atleastsix
    return int(self.es_version.split(".")[0]) >= 6
  File "/usr/lib/python2.7/site-packages/elastalert/elastalert.py", line 175, in es_version
    self._es_version = self.get_version()
  File "/usr/lib/python2.7/site-packages/elastalert/elastalert.py", line 169, in get_version
    info = self.writeback_es.info()
  File "/usr/lib/python2.7/site-packages/elasticsearch/client/utils.py", line 76, in _wrapped
    return func(*args, params=params, **kwargs)
  File "/usr/lib/python2.7/site-packages/elasticsearch/client/__init__.py", line 241, in info
    return self.transport.perform_request('GET', '/', params=params)
  File "/usr/lib/python2.7/site-packages/elasticsearch/transport.py", line 318, in perform_request
    status, headers_response, data = connection.perform_request(method, url, params, body, headers=headers, ignore=ignore, timeout=timeout)
  File "/usr/lib/python2.7/site-packages/elasticsearch/connection/http_requests.py", line 85, in perform_request
    raise ConnectionError('N/A', str(e), e)
elasticsearch.exceptions.ConnectionError: ConnectionError(HTTPConnectionPool(host='127.0.0.1', port=9201): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x2bbc510>: Failed to establish a new connection: [Errno 111] Connection refused',))) caused by: ConnectionError(HTTPConnectionPool(host='127.0.0.1', port=9201): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x2bbc510>: Failed to establish a new connection: [Errno 111] Connection refused',)))

real    0m23.472s
user    0m1.142s
sys 0m0.139s
kyatapi commented 5 years ago

Have the same problem. Wondering how the others solving it.

chrislujan commented 5 years ago

Have the same problem. Wondering how the others solving it.

I don't recall ever finding functionality that reflects the claims of these flags. I don't think they're actually implemented. Someone prove me wrong

krishnakumar27 commented 5 years ago

I have tried --patience minutes=5 flag during run time as well as es_conn_timeout in the config file to give buffer time for Elasticsearch instance to come up... Both doesn't seem to work..

AndreLouisCaron commented 5 years ago

I just hit this issue as well. Running elastalert --patience ... has no impact: if ElasticSearch is not immediately reachable, elastalert fails to start.

I think I figured out why. Maybe @Qmando can provide some insight on how to fix this.

Here is a stripped-down version of elastalert/elastalert.py, presented slightly out of order to make the timing more obvious.

def main(args=None):
    ...
    # This constructor tries to access ElasticSearch immediately
    # and an exception is raised if ElasticSearch is not available.
    client = ElastAlerter(args)

    # The `.start()` method internally calls `.wait_until_responsive()`.
    # This has no impact because the constructor above already
    # raised an exception.
    if not client.args.silence:
        client.start()

class ElastAlerter():

    # This is called directly from `main()`.  In this method, `self.args.timeout`
    # is a `timedelta` object converted from the `--patience` option.
    def start(self):
        ...
        self.wait_until_responsive(timeout=self.args.timeout)
        ...

    # However, the constructor has to complete fist.
    #
    # The call to `.init_rule()` is the one that raises an exception.
    def __init__(self, args):
        self.parse_args(args)
        ...
        for rule in self.rules:
            if not self.init_rule(rule):
                ...
        ...

    # Internally, this tries to send a notification email because ElasticSearch
    # is unreachable.
    def init_rule(self, new_rule, new=True):
        try:
            self.modify_rule_for_ES5(new_rule)
        except TransportError as e:
            elastalert_logger.warning('Error connecting to Elasticsearch for rule {}. '
                                      'The rule has been disabled.'.format(new_rule['name']))
            self.send_notification_email(exception=e, rule=new_rule)
            return False
        ....

    def send_notification_email(self, text='', exception=None, rule=None, subject=None, rule_file=None):
        ...
        try:
            smtp = SMTP(self.smtp_host)
            smtp.sendmail(self.from_addr, recipients, email.as_string())
        except (SMTPException, error) as e:
            self.handle_error('Error connecting to SMTP host: %s' % (e), {'email_body': email_body})

    # This method tries to handle the SMTP connection error and dedices to write
    # that error to ElasticSearch.
    def handle_error(self, message, data=None):
        ...
        self.writeback('elastalert_error', body)

In short, ElastAlert tries to write to ElasticSeach to inform the user that ElasticSearch is not available.

AndreLouisCaron commented 5 years ago

In case this may help others, I work around this issue by polling ElasticSearch in a separate script until it is reachable before starting ElastAlert.

Here is a gist in containing my help script: https://gist.github.com/AndreLouisCaron/782e09443588e3d9d5167623c8fc8a08

A few notes:

  1. It only accepts configuration from environment variables (ES_HOST, ES_PORT and ES_USE_SSL). It will NOT read from config.yaml NOR from rules/*.yaml.
  2. It supports access to ElasticSearch without authentication.
  3. It supports access to AWS ElasticSearch with authentication. Just set AWS_DEFAULT_REGION and set your access keys using any method supported by boto3.
  4. It only supports ElasticSeach 5. You may need to tweak the script slightly for ElasticSearch 6 due to the change related to suppression of document types.
Qmando commented 5 years ago

I don't think they're actually implemented. Someone prove me wrong 😆

It is certainly implemented. I think I broke it when I added functionality for multiple rules to be connected to different elasticsearch clusters with different versions. Now, when a rule loads, it tries to figure out Elasticsearch version first, whereas previously it wouldn't connect until it actually RAN. This means that the error occurs before wait_until_reponsive is even called.

The fix should be pretty straight forward. Move wait_until_responsive into init instead of start.