Open chrislujan opened 5 years ago
Have the same problem. Wondering how the others solving it.
Have the same problem. Wondering how the others solving it.
I don't recall ever finding functionality that reflects the claims of these flags. I don't think they're actually implemented. Someone prove me wrong
I have tried --patience minutes=5
flag during run time as well as es_conn_timeout
in the config file to give buffer time for Elasticsearch instance to come up... Both doesn't seem to work..
I just hit this issue as well. Running elastalert --patience ...
has no impact: if ElasticSearch is not immediately reachable, elastalert
fails to start.
I think I figured out why. Maybe @Qmando can provide some insight on how to fix this.
Here is a stripped-down version of elastalert/elastalert.py
, presented slightly out of order to make the timing more obvious.
def main(args=None):
...
# This constructor tries to access ElasticSearch immediately
# and an exception is raised if ElasticSearch is not available.
client = ElastAlerter(args)
# The `.start()` method internally calls `.wait_until_responsive()`.
# This has no impact because the constructor above already
# raised an exception.
if not client.args.silence:
client.start()
class ElastAlerter():
# This is called directly from `main()`. In this method, `self.args.timeout`
# is a `timedelta` object converted from the `--patience` option.
def start(self):
...
self.wait_until_responsive(timeout=self.args.timeout)
...
# However, the constructor has to complete fist.
#
# The call to `.init_rule()` is the one that raises an exception.
def __init__(self, args):
self.parse_args(args)
...
for rule in self.rules:
if not self.init_rule(rule):
...
...
# Internally, this tries to send a notification email because ElasticSearch
# is unreachable.
def init_rule(self, new_rule, new=True):
try:
self.modify_rule_for_ES5(new_rule)
except TransportError as e:
elastalert_logger.warning('Error connecting to Elasticsearch for rule {}. '
'The rule has been disabled.'.format(new_rule['name']))
self.send_notification_email(exception=e, rule=new_rule)
return False
....
def send_notification_email(self, text='', exception=None, rule=None, subject=None, rule_file=None):
...
try:
smtp = SMTP(self.smtp_host)
smtp.sendmail(self.from_addr, recipients, email.as_string())
except (SMTPException, error) as e:
self.handle_error('Error connecting to SMTP host: %s' % (e), {'email_body': email_body})
# This method tries to handle the SMTP connection error and dedices to write
# that error to ElasticSearch.
def handle_error(self, message, data=None):
...
self.writeback('elastalert_error', body)
In short, ElastAlert tries to write to ElasticSeach to inform the user that ElasticSearch is not available.
In case this may help others, I work around this issue by polling ElasticSearch in a separate script until it is reachable before starting ElastAlert.
Here is a gist in containing my help script: https://gist.github.com/AndreLouisCaron/782e09443588e3d9d5167623c8fc8a08
A few notes:
ES_HOST
, ES_PORT
and ES_USE_SSL
). It will NOT read from config.yaml
NOR from rules/*.yaml
.AWS_DEFAULT_REGION
and set your access keys using any method supported by boto3
.I don't think they're actually implemented. Someone prove me wrong
😆
It is certainly implemented. I think I broke it when I added functionality for multiple rules to be connected to different elasticsearch clusters with different versions. Now, when a rule loads, it tries to figure out Elasticsearch version first, whereas previously it wouldn't connect until it actually RAN. This means that the error occurs before wait_until_reponsive is even called.
The fix should be pretty straight forward. Move wait_until_responsive into init instead of start.
version:
I have a functioning config but I would like elastalert to be patient as elasticsearch sometimes takes a minute to come up after a restart. I have tried using --patience with no luck. I'm not sure what I'm doing wrong. I have also noticed that if ES is not responding, rules are disabled, regardless of my use of
disable_rules_on_error
(see 5th warning of log). Is it possible that my configurations aren't being applied?Here's my config:
Here's my command and output: