Closed hc4 closed 4 years ago
The message is logged once per alert. When Elasticsearch is not available this is an exceptional circumstance and we log loudly. That's very unlikely to change.
There is no sense to log it thousand times per second :)
Fair enough
Maybe I'm wrong, but it seems that graylog after error tries to send new request without any grace period. And this leads to many errors. Adding pause before resending request after error will solve problem.
ps. why is there still tag "won't fix"?
Because I forgot to remove it :) The error comes out of the inner loop and should only be logged once per configured alert. We'll look into it during the beta phase.
What I found out is that the error message is printed for each configured and active (the stream it is created for is not paused) alert condition once per alert scanner run. As we do not know if an alert condition requires ES or anything else (as conditions are pluggable now) there is no easy way at the moment to reduce the amount of logging. A solution might be to introduce a circuit breaker for the indexer, which is beyond the scope of 2.2.
@hc4: how many alert conditions do you have configured? Are you running the alert checking at a non-default interval?
I have 1-2 alerts per stream with total count of alerts about 10. Alert checking interval is default
How many streams do you have in total?
8, but not all of them have alerts
This should no longer be an issue with the new alerting and the events system. Please open a new issue if this is still a problem with the new system. Thank you!
When ES claster is not available, I got tons of errors about skipping alerts check
This is search for such errors. They happend many times per second.
Expected Behavior
There is must be some grace period, because currently it seems that Graylog DDOS'ing my ES cluster :)
Your Environment