Icinga / icinga2

The core of our monitoring platform with a powerful configuration language and REST API.
https://icinga.com/docs/icinga2/latest
GNU General Public License v2.0
2.03k stars 578 forks source link

LMD gets the whole livestatus service table every 5 minutes, instead of just the latest delta changes as expected #8571

Closed steffen-poulsen closed 3 years ago

steffen-poulsen commented 3 years ago

Describe the bug

Our LMD daemon erroneously retrieves the service table fully from the Icinga livestatus API once every 5 minutes, instead of just a small partial update as expected.

This came up while working with the LMD in https://github.com/sni/lmd/issues/106.

We are seeing some delays in our Thruk GUI, and we are currently looking into how we might be able to reduce delays.

The symptom that we are seeing, looks like this from the LMD log:

$ cat /var/log/lmd.log | grep Master | grep delta | tail -20
[2020-12-20 07:44:02.410][Debug][datastoreset.go:190] [Master] delta update complete in: 4.064986393s
[2020-12-20 07:44:03.448][Debug][datastoreset.go:190] [Master] delta update complete in: 155.261513ms
[2020-12-20 07:44:08.468][Debug][datastoreset.go:190] [Master] delta update complete in: 175.2933ms
[2020-12-20 07:44:13.434][Debug][datastoreset.go:190] [Master] delta update complete in: 141.311304ms
[2020-12-20 07:44:18.412][Debug][datastoreset.go:190] [Master] delta update complete in: 119.765381ms
[2020-12-20 07:44:23.432][Debug][datastoreset.go:190] [Master] delta update complete in: 139.721324ms
[2020-12-20 07:50:01.720][Debug][datastoreset.go:190] [Master] delta update complete in: 5m33.427098019s
[2020-12-20 07:50:06.182][Debug][datastoreset.go:190] [Master] delta update complete in: 4.288491672s
[2020-12-20 07:50:06.328][Debug][datastoreset.go:190] [Master] delta update complete in: 145.891187ms
[2020-12-20 07:50:11.885][Debug][datastoreset.go:190] [Master] delta update complete in: 592.139606ms
[2020-12-20 07:50:16.427][Debug][datastoreset.go:190] [Master] delta update complete in: 134.410727ms
[2020-12-20 07:50:21.429][Debug][datastoreset.go:190] [Master] delta update complete in: 135.659986ms
[2020-12-20 07:50:26.671][Debug][datastoreset.go:190] [Master] delta update complete in: 377.979874ms
[2020-12-20 07:55:20.289][Debug][datastoreset.go:190] [Master] delta update complete in: 4m48.996771052s
[2020-12-20 07:55:26.917][Debug][datastoreset.go:190] [Master] delta update complete in: 6.478754731s
[2020-12-20 07:55:27.237][Debug][datastoreset.go:190] [Master] delta update complete in: 319.991544ms
[2020-12-20 07:55:31.443][Debug][datastoreset.go:190] [Master] delta update complete in: 150.762215ms
[2020-12-20 07:55:36.453][Debug][datastoreset.go:190] [Master] delta update complete in: 159.737648ms
[2020-12-20 07:55:41.472][Debug][datastoreset.go:190] [Master] delta update complete in: 179.066883ms
[2020-12-20 07:55:46.493][Debug][datastoreset.go:190] [Master] delta update complete in: 200.07957ms

This is a cyclic pattern, with one slow update and 5-6 fast updates every 5 minutes.

Instead of this pattern, we are expecting to see only fast updates - one update every 5s, each taking only a few 100 milliseconds.

These fast and frequent updates are what we are consistently seeing with our Nagios backends, and it is what we hope to see with our Icinga backends as well.

We are wondering what the cause of this behavior is.

The only correlation we have seen so far, is a correlation with the Icinga program state being dumped. The dump happens every 5m also.

Any input and directions on what the root cause might be and how we could debug this further are very much appreciated.

The combined Icinga and LMD log looks like this - suggesting to us that there might be a relationship between them in some way.

Icinga: "Dumping program state to file"
LMD: "delta update complete"

[2020-12-20 08:54:48.481][Debug][datastoreset.go:190] [Master] delta update complete in: 6.140863504s
[2020-12-20 08:54:48.615][Debug][datastoreset.go:190] [Master] delta update complete in: 133.948009ms
[2020-12-20 08:54:53.438][Debug][datastoreset.go:190] [Master] delta update complete in: 145.910362ms
[2020-12-20 08:54:58.444][Debug][datastoreset.go:190] [Master] delta update complete in: 151.851125ms
[2020-12-20 08:55:03.682][Debug][datastoreset.go:190] [Master] delta update complete in: 389.906444ms
[2020-12-20 08:55:08.439][Debug][datastoreset.go:190] [Master] delta update complete in: 146.073191ms
[2020-12-20 08:56:55 +0100] information/ConfigObject: Dumping program state to file '/var/lib/icinga2/icinga2.state'
[2020-12-20 09:02:00 +0100] information/ConfigObject: Dumping program state to file '/var/lib/icinga2/icinga2.state'
[2020-12-20 09:02:45.577][Debug][datastoreset.go:190] [Master] delta update complete in: 7m32.284273005s
[2020-12-20 09:02:50.221][Debug][datastoreset.go:190] [Master] delta update complete in: 4.504214494s
[2020-12-20 09:02:50.340][Debug][datastoreset.go:190] [Master] delta update complete in: 119.89317ms
[2020-12-20 09:02:55.411][Debug][datastoreset.go:190] [Master] delta update complete in: 118.649579ms
[2020-12-20 09:03:00.531][Debug][datastoreset.go:190] [Master] delta update complete in: 120.189994ms
[2020-12-20 09:03:05.436][Debug][datastoreset.go:190] [Master] delta update complete in: 143.19224ms
[2020-12-20 09:03:10.425][Debug][datastoreset.go:190] [Master] delta update complete in: 133.044604ms
[2020-12-20 09:07:05 +0100] information/ConfigObject: Dumping program state to file '/var/lib/icinga2/icinga2.state'
[2020-12-20 09:07:45.191][Debug][datastoreset.go:190] [Master] delta update complete in: 4m29.898820201s
[2020-12-20 09:07:50.749][Debug][datastoreset.go:190] [Master] delta update complete in: 5.423187908s
[2020-12-20 09:07:50.867][Debug][datastoreset.go:190] [Master] delta update complete in: 118.465535ms
[2020-12-20 09:07:55.421][Debug][datastoreset.go:190] [Master] delta update complete in: 128.104366ms
[2020-12-20 09:08:00.555][Debug][datastoreset.go:190] [Master] delta update complete in: 146.868547ms
[2020-12-20 09:08:05.593][Debug][datastoreset.go:190] [Master] delta update complete in: 300.831588ms
[2020-12-20 09:08:10.422][Debug][datastoreset.go:190] [Master] delta update complete in: 129.258117ms
[2020-12-20 09:12:11 +0100] information/ConfigObject: Dumping program state to file '/var/lib/icinga2/icinga2.state'
[2020-12-20 09:13:05.199][Debug][datastoreset.go:190] [Master] delta update complete in: 4m49.90613987s
[2020-12-20 09:13:09.794][Debug][datastoreset.go:190] [Master] delta update complete in: 4.480850288s
[2020-12-20 09:13:10.418][Debug][datastoreset.go:190] [Master] delta update complete in: 125.970845ms
[2020-12-20 09:13:15.433][Debug][datastoreset.go:190] [Master] delta update complete in: 140.706912ms
[2020-12-20 09:13:20.443][Debug][datastoreset.go:190] [Master] delta update complete in: 141.777216ms
[2020-12-20 09:13:25.419][Debug][datastoreset.go:190] [Master] delta update complete in: 126.642408ms
[2020-12-20 09:13:30.440][Debug][datastoreset.go:190] [Master] delta update complete in: 147.17756ms
[2020-12-20 09:17:16 +0100] information/ConfigObject: Dumping program state to file '/var/lib/icinga2/icinga2.state'
[2020-12-20 09:18:13.443][Debug][datastoreset.go:190] [Master] delta update complete in: 4m38.150211849s
[2020-12-20 09:18:17.934][Debug][datastoreset.go:190] [Master] delta update complete in: 4.368250464s
[2020-12-20 09:18:18.698][Debug][datastoreset.go:190] [Master] delta update complete in: 405.107611ms
[2020-12-20 09:18:23.440][Debug][datastoreset.go:190] [Master] delta update complete in: 147.788024ms
[2020-12-20 09:18:28.544][Debug][datastoreset.go:190] [Master] delta update complete in: 251.476021ms
[2020-12-20 09:18:33.414][Debug][datastoreset.go:190] [Master] delta update complete in: 121.692444ms
[2020-12-20 09:18:38.413][Debug][datastoreset.go:190] [Master] delta update complete in: 120.521519ms
[2020-12-20 09:22:21 +0100] information/ConfigObject: Dumping program state to file '/var/lib/icinga2/icinga2.state'
[2020-12-20 09:23:24.204][Debug][datastoreset.go:190] [Master] delta update complete in: 4m40.91106028s
[2020-12-20 09:23:28.422][Debug][datastoreset.go:190] [Master] delta update complete in: 4.091027372s
[2020-12-20 09:23:29.415][Debug][datastoreset.go:190] [Master] delta update complete in: 122.942043ms
[2020-12-20 09:23:34.416][Debug][datastoreset.go:190] [Master] delta update complete in: 122.531084ms
[2020-12-20 09:23:39.578][Debug][datastoreset.go:190] [Master] delta update complete in: 284.757844ms
[2020-12-20 09:23:44.427][Debug][datastoreset.go:190] [Master] delta update complete in: 134.40532ms
[2020-12-20 09:23:49.444][Debug][datastoreset.go:190] [Master] delta update complete in: 151.362747ms
[2020-12-20 09:27:26 +0100] information/ConfigObject: Dumping program state to file '/var/lib/icinga2/icinga2.state'
[2020-12-20 09:30:35.215][Debug][datastoreset.go:190] [Master] delta update complete in: 6m40.922712924s
[2020-12-20 09:30:41.065][Debug][datastoreset.go:190] [Master] delta update complete in: 5.652551801s
[2020-12-20 09:30:41.270][Debug][datastoreset.go:190] [Master] delta update complete in: 204.990968ms
[2020-12-20 09:30:46.443][Debug][datastoreset.go:190] [Master] delta update complete in: 150.984616ms
[2020-12-20 09:30:51.447][Debug][datastoreset.go:190] [Master] delta update complete in: 154.50481ms
[2020-12-20 09:30:56.450][Debug][datastoreset.go:190] [Master] delta update complete in: 157.355238ms
[2020-12-20 09:31:01.449][Debug][datastoreset.go:190] [Master] delta update complete in: 156.407295ms
[2020-12-20 09:32:31 +0100] information/ConfigObject: Dumping program state to file '/var/lib/icinga2/icinga2.state'
[2020-12-20 09:36:17.223][Debug][datastoreset.go:190] [Master] delta update complete in: 5m10.931016326s
[2020-12-20 09:36:21.360][Debug][datastoreset.go:190] [Master] delta update complete in: 4.015486264s
[2020-12-20 09:36:22.429][Debug][datastoreset.go:190] [Master] delta update complete in: 136.1218ms
[2020-12-20 09:36:27.669][Debug][datastoreset.go:190] [Master] delta update complete in: 376.306213ms
[2020-12-20 09:36:32.433][Debug][datastoreset.go:190] [Master] delta update complete in: 139.877154ms
[2020-12-20 09:36:37.446][Debug][datastoreset.go:190] [Master] delta update complete in: 153.011974ms
[2020-12-20 09:36:42.427][Debug][datastoreset.go:190] [Master] delta update complete in: 134.886847ms
[2020-12-20 09:37:37 +0100] information/ConfigObject: Dumping program state to file '/var/lib/icinga2/icinga2.state'
[2020-12-20 09:42:23.108][Debug][datastoreset.go:190] [Master] delta update complete in: 5m35.81606794s
[2020-12-20 09:42:27.279][Debug][datastoreset.go:190] [Master] delta update complete in: 4.047176295s
[2020-12-20 09:42:28.649][Debug][datastoreset.go:190] [Master] delta update complete in: 356.146401ms
[2020-12-20 09:42:33.423][Debug][datastoreset.go:190] [Master] delta update complete in: 130.210882ms
[2020-12-20 09:42:38.433][Debug][datastoreset.go:190] [Master] delta update complete in: 140.550589ms
[2020-12-20 09:42:42 +0100] information/ConfigObject: Dumping program state to file '/var/lib/icinga2/icinga2.state'
[2020-12-20 09:42:43.429][Debug][datastoreset.go:190] [Master] delta update complete in: 135.198521ms
[2020-12-20 09:42:48.613][Debug][datastoreset.go:190] [Master] delta update complete in: 320.586678ms
[2020-12-20 09:47:30.583][Debug][datastoreset.go:190] [Master] delta update complete in: 4m37.290165804s
[2020-12-20 09:47:34.965][Debug][datastoreset.go:190] [Master] delta update complete in: 4.134958867s
[2020-12-20 09:47:35.431][Debug][datastoreset.go:190] [Master] delta update complete in: 139.138234ms
[2020-12-20 09:47:40.424][Debug][datastoreset.go:190] [Master] delta update complete in: 131.999373ms
[2020-12-20 09:47:45.416][Debug][datastoreset.go:190] [Master] delta update complete in: 123.400834ms
[2020-12-20 09:47:47 +0100] information/ConfigObject: Dumping program state to file '/var/lib/icinga2/icinga2.state'
[2020-12-20 09:47:50.486][Debug][datastoreset.go:190] [Master] delta update complete in: 194.015026ms
[2020-12-20 09:47:55.418][Debug][datastoreset.go:190] [Master] delta update complete in: 125.194252ms
[2020-12-20 09:52:25.709][Debug][datastoreset.go:190] [Master] delta update complete in: 4m25.294444883s
[2020-12-20 09:52:29.733][Debug][datastoreset.go:190] [Master] delta update complete in: 3.750264707s
[2020-12-20 09:52:30.606][Debug][datastoreset.go:190] [Master] delta update complete in: 313.371107ms
[2020-12-20 09:52:35.425][Debug][datastoreset.go:190] [Master] delta update complete in: 132.23396ms
[2020-12-20 09:52:40.431][Debug][datastoreset.go:190] [Master] delta update complete in: 137.856836ms
[2020-12-20 09:52:45.426][Debug][datastoreset.go:190] [Master] delta update complete in: 133.700451ms
[2020-12-20 09:52:50.425][Debug][datastoreset.go:190] [Master] delta update complete in: 132.632589ms
[2020-12-20 09:52:53 +0100] information/ConfigObject: Dumping program state to file '/var/lib/icinga2/icinga2.state'
[2020-12-20 09:57:58 +0100] information/ConfigObject: Dumping program state to file '/var/lib/icinga2/icinga2.state'
[2020-12-20 09:59:38.732][Debug][datastoreset.go:190] [Master] delta update complete in: 6m43.439946293s
[2020-12-20 09:59:45.197][Debug][datastoreset.go:190] [Master] delta update complete in: 6.339484079s
[2020-12-20 09:59:45.346][Debug][datastoreset.go:190] [Master] delta update complete in: 148.957745ms
[2020-12-20 09:59:50.441][Debug][datastoreset.go:190] [Master] delta update complete in: 148.442352ms
[2020-12-20 09:59:55.449][Debug][datastoreset.go:190] [Master] delta update complete in: 156.2786ms
[2020-12-20 10:00:00.911][Debug][datastoreset.go:190] [Master] delta update complete in: 446.412856ms
[2020-12-20 10:00:05.467][Debug][datastoreset.go:190] [Master] delta update complete in: 175.070452ms

To Reproduce

There is nothing special to our setup that I can think of, so I am unsure how I can direct anyone in reproducing this.

Our setup is growing, and we didn't see this behavior from the start. So, it might have to do with the size of the environment.

Currently we are at 4.5k hosts and 35k service checks. This configuration is spread somewhat equally over 200 endpoints.

Expected behavior

We expect the LMD to be able to retrieve small, fast delta updates from the livestatus API always.

Your Environment

# icinga2 --version
icinga2 - The Icinga 2 network monitoring daemon (version: 2.12.3)

Copyright (c) 2012-2020 Icinga GmbH (https://icinga.com/)
License GPLv2+: GNU GPL version 2 or later <http://gnu.org/licenses/gpl2.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

System information:
  Platform: Red Hat Enterprise Linux Server
  Platform version: 7.9 (Maipo)
  Kernel: Linux
  Kernel version: 3.10.0-1160.11.1.el7.x86_64
  Architecture: x86_64

Build information:
  Compiler: GNU 4.8.5
  Build host: runner-hh8q3bz2-project-322-concurrent-0
  OpenSSL version: OpenSSL 1.0.2k-fips  26 Jan 2017

Application information:

General paths:
  Config directory: /etc/icinga2
  Data directory: /var/lib/icinga2
  Log directory: /var/log/icinga2
  Cache directory: /var/cache/icinga2
  Spool directory: /var/spool/icinga2
  Run directory: /run/icinga2

Old paths (deprecated):
  Installation root: /usr
  Sysconf directory: /etc
  Run directory (base): /run
  Local state directory: /var

Internal paths:
  Package data directory: /usr/share/icinga2
  State path: /var/lib/icinga2/icinga2.state
  Modified attributes path: /var/lib/icinga2/modified-attributes.conf
  Objects path: /var/cache/icinga2/icinga2.debug
  Vars path: /var/cache/icinga2/icinga2.vars
  PID path: /run/icinga2/icinga2.pid
# uname -a
Linux <hostname> 3.10.0-1160.11.1.el7.x86_64 #1 SMP Mon Nov 30 13:05:31 EST 2020 x86_64 x86_64 x86_64 GNU/Linux
# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.9 (Maipo)
# icinga2 feature list
Disabled features: command compatlog debuglog elasticsearch gelf graphite icingadb influxdb opentsdb perfdata statusdata
Enabled features: api checker ido-mysql livestatus mainlog notification syslog
Icinga Web 2 Version
2.8.2
Git commit
8a89839af94a247ee2149b2336c73b8251b477c0
PHP Version
7.3.20
Git commit date
2020-08-17
Copyright
© 2013-2020 The Icinga Project

Loaded modules
Name | Version
-- | --
ipl | v0.5.0
monitoring | 2.8.2
reactbundle | 0.8.0
x509 | 1.0.0
# icinga2 daemon -C
[2020-12-20 10:44:10 +0100] information/cli: Icinga application loader (version: 2.12.3)
[2020-12-20 10:44:10 +0100] information/cli: Loading configuration file(s).
[2020-12-20 10:44:13 +0100] information/ConfigItem: Committing config item(s).
[2020-12-20 10:44:13 +0100] information/ApiListener: My API identity: dkcdcwppicnmrp01.vestasext.net
[2020-12-20 10:44:23 +0100] information/WorkQueue: #4 (DaemonUtility::LoadConfigFiles) items: 16, rate: 146.133/s (8768/min 8768/5min 8768/15min);
[2020-12-20 10:44:23 +0100] information/WorkQueue: #7 (ApiListener, SyncQueue) items: 0, rate:  0/s (0/min 0/5min 0/15min);
[2020-12-20 10:44:23 +0100] information/WorkQueue: #6 (ApiListener, RelayQueue) items: 0, rate:  0/s (0/min 0/5min 0/15min);
[2020-12-20 10:44:33 +0100] information/WorkQueue: #4 (DaemonUtility::LoadConfigFiles) items: 12, rate: 146.2/s (8772/min 8772/5min 8772/15min); empty in less than 1 millisecond
[2020-12-20 10:44:35 +0100] information/ConfigItem: Instantiated 1 NotificationComponent.
[2020-12-20 10:44:35 +0100] information/ConfigItem: Instantiated 1 SyslogLogger.
[2020-12-20 10:44:35 +0100] information/ConfigItem: Instantiated 4649 Hosts.
[2020-12-20 10:44:35 +0100] information/ConfigItem: Instantiated 1 Downtime.
[2020-12-20 10:44:35 +0100] information/ConfigItem: Instantiated 2 NotificationCommands.
[2020-12-20 10:44:35 +0100] information/ConfigItem: Instantiated 1 FileLogger.
[2020-12-20 10:44:35 +0100] information/ConfigItem: Instantiated 274 Comments.
[2020-12-20 10:44:35 +0100] information/ConfigItem: Instantiated 4661 Notifications.
[2020-12-20 10:44:35 +0100] information/ConfigItem: Instantiated 1 IcingaApplication.
[2020-12-20 10:44:35 +0100] information/ConfigItem: Instantiated 3824 HostGroups.
[2020-12-20 10:44:35 +0100] information/ConfigItem: Instantiated 1 EventCommand.
[2020-12-20 10:44:35 +0100] information/ConfigItem: Instantiated 4544 Dependencies.
[2020-12-20 10:44:35 +0100] information/ConfigItem: Instantiated 1 CheckerComponent.
[2020-12-20 10:44:35 +0100] information/ConfigItem: Instantiated 202 Zones.
[2020-12-20 10:44:35 +0100] information/ConfigItem: Instantiated 200 Endpoints.
[2020-12-20 10:44:35 +0100] information/ConfigItem: Instantiated 1 IdoMysqlConnection.
[2020-12-20 10:44:35 +0100] information/ConfigItem: Instantiated 2 ApiUsers.
[2020-12-20 10:44:35 +0100] information/ConfigItem: Instantiated 1 ApiListener.
[2020-12-20 10:44:35 +0100] information/ConfigItem: Instantiated 616 CheckCommands.
[2020-12-20 10:44:35 +0100] information/ConfigItem: Instantiated 1 LivestatusListener.
[2020-12-20 10:44:35 +0100] information/ConfigItem: Instantiated 3 TimePeriods.
[2020-12-20 10:44:35 +0100] information/ConfigItem: Instantiated 3702 UserGroups.
[2020-12-20 10:44:35 +0100] information/ConfigItem: Instantiated 1590 Users.
[2020-12-20 10:44:35 +0100] information/ConfigItem: Instantiated 35530 Services.
[2020-12-20 10:44:35 +0100] information/ConfigItem: Instantiated 7 ServiceGroups.
[2020-12-20 10:44:35 +0100] information/ConfigItem: Instantiated 1 ScheduledDowntime.
[2020-12-20 10:44:35 +0100] information/ScriptGlobal: Dumping variables to file '/var/cache/icinga2/icinga2.vars'
[2020-12-20 10:44:35 +0100] information/cli: Finished validating the configuration file(s).

200 endpoints, 202 zones, very similar in shape and size.

# icinga2 object list --type Endpoint | grep "of type 'Endpoint" | wc -l
200
# icinga2 object list --type Zone | grep "of type 'Zone" | wc -l
202

Additional context

We expect to be growing at a rate of around 300 endpoints per year in the coming years. We are at 200 endpoints now.

If this issue is related to scaling, we are curious to know any limits we are or will be challenged with.

Al2Klimov commented 3 years ago

@lippserd We've deprecated Livestatus. Will anything ever happen here?

lippserd commented 3 years ago

@lippserd We've deprecated Livestatus. Will anything ever happen here?

No, I don't think so. But that may have been fixed on the lmd side anyway.