Closed steffen-poulsen closed 3 years ago
@lippserd We've deprecated Livestatus. Will anything ever happen here?
@lippserd We've deprecated Livestatus. Will anything ever happen here?
No, I don't think so. But that may have been fixed on the lmd side anyway.
Describe the bug
Our LMD daemon erroneously retrieves the service table fully from the Icinga livestatus API once every 5 minutes, instead of just a small partial update as expected.
This came up while working with the LMD in https://github.com/sni/lmd/issues/106.
We are seeing some delays in our Thruk GUI, and we are currently looking into how we might be able to reduce delays.
The symptom that we are seeing, looks like this from the LMD log:
This is a cyclic pattern, with one slow update and 5-6 fast updates every 5 minutes.
Instead of this pattern, we are expecting to see only fast updates - one update every 5s, each taking only a few 100 milliseconds.
These fast and frequent updates are what we are consistently seeing with our Nagios backends, and it is what we hope to see with our Icinga backends as well.
We are wondering what the cause of this behavior is.
The only correlation we have seen so far, is a correlation with the Icinga program state being dumped. The dump happens every 5m also.
Any input and directions on what the root cause might be and how we could debug this further are very much appreciated.
The combined Icinga and LMD log looks like this - suggesting to us that there might be a relationship between them in some way.
To Reproduce
There is nothing special to our setup that I can think of, so I am unsure how I can direct anyone in reproducing this.
Our setup is growing, and we didn't see this behavior from the start. So, it might have to do with the size of the environment.
Currently we are at 4.5k hosts and 35k service checks. This configuration is spread somewhat equally over 200 endpoints.
Expected behavior
We expect the LMD to be able to retrieve small, fast delta updates from the livestatus API always.
Your Environment
icinga2 --version
):icinga2 feature list
)icinga2 daemon -C
)zones.conf
file (oricinga2 object list --type Endpoint
andicinga2 object list --type Zone
) from all affected nodes.200 endpoints, 202 zones, very similar in shape and size.
Additional context
We expect to be growing at a rate of around 300 endpoints per year in the coming years. We are at 200 endpoints now.
If this issue is related to scaling, we are curious to know any limits we are or will be challenged with.