Closed DMMCA closed 7 years ago
@DMMCA Just to make sure I get it correct: Did the connection between icingabeat <-> Elasticsearch break or between Icinga2 <-> icingabeat?
Original discussion: https://monitoring-portal.org/index.php?thread/40420-icingabeat/
There are two things that can happen regarding connection losses:
Icingabeat loses the connection to Icinga2
In this case, Icingabeat will try to reconnect periodically to Icinga2. The interval for reconnects can be configured in retry_interval
. When the connection is established again, Icinga2 continuous to send event from that point. There is no buffering, events from the past are lost. There is however a feature request open to add buffering to the event stream API: https://github.com/Icinga/icinga2/issues/4604
Icingabeat loses connection to Elasticsearch (or any other output)
As I understand this is what happened to you. If Icingabeat loses the connection to Elasticsearch but keeps getting events from the Icinga2 API, these events are stored in queues. The size of the queues is configurable through queue_size
and bulk_queue_size
. From my understanding your queues were big enough to store old events. Therefore, when Elasticsearch came up again, all events in the queues were sent to Elasticsearch. These queues are in-memory and configuring too high values can fill up your memory. There is a discussion about adding a feature to libbeat to buffer events to the disk: https://github.com/elastic/beats/issues/575
Hope I could help you!
I was using the short version icingabeat.yml config file not the full. I thought the values you talking about were being loaded by default:
-----EXTRATED FROM FULL VERSION-----
# Internal queue size for single events in processing pipeline
#queue_size: 1000
# The internal queue size for bulk events in the processing pipeline.
# Do not modify this value.
#bulk_queue_size: 0
--------------------------------------------------
------From my most recent log after restarting icingabeat ----------------------------------
INFO Max Retries set to: 3
INFO Flush Interval set to: 1s
INFO Max Bulk Size set to: 50
INFO Flush Interval set to: 1s
--------------------------------------------------
So you your advice is to configure those values to a minimum of what and if the full version should be used instead?
You don't need to use the full version, you can just copy the settings you want to override to your icingabeat.yml
. If you don't want buffering, I suggest you set the queue_size
to a minimum, let's say queue_size: 5
.
One more thing: With the max_retries
setting you can configure how often icingabeat will try to send an event to Elasticsearch before it gives up and drops the event. The default is max_retries: 3
. Decreasing this will result in fewer events in the queue because they get dropped faster.
Hello, I want to report a possible issue with the icingabeat application. My testing environment has the following setup: Icinga Master(with Icingabeats) and clients, no zones defined, just the standard thing and another VM with ELK. The problem was that when i left the ELK VM down for the weekend, and turned it back up on monday i was receiving events from saturday with a huge delay between them beeing displayed on Kibana, left it down to test the scenario of a client losing his Link (this one being on a different GeoLocation from the ELK Server). Several hours passed and still no "LIVE EVENT STREAM" were being displayed, only when i restarted the icingabeat it became "live".
Best Regards,
DMMCA