Icinga / icingaweb2-module-director

The Director aims to be your new favourite Icinga config deployment tool. Director is designed for those who want to automate their configuration deployment and those who want to grant their “point & click” users easy access to the configuration.
https://icinga.com/docs/director/latest
GNU General Public License v2.0
413 stars 202 forks source link

Override custom properties not working for objects #1579

Closed quentinsch closed 6 years ago

quentinsch commented 6 years ago

Expected Behavior

When you add custom properties for a service on a host, it should be an override of the service template you've chosen with default values. In the host overview it says that it has override service variables and they should be an override of the template. It shows the override correctly in the host overview:

screen shot 2018-07-26 at 11 41 41

Current Behavior

The override properties are ignored and the default values are used in the checks. As shown in the screenshots, the properties are set but they are ignored in the check somehow. On the left the override for the check and on the right the default setting for this service check:

screen shot 2018-07-26 at 11 45 51

So the details on the check do not show the override, although they are set in the backend. It looks like it got a bit messed up since the new version, because in the previous release (2.4.0) it worked just fine. Host details:

screen shot 2018-07-26 at 11 47 38 screen shot 2018-07-26 at 11 47 48

UPDATE: The right override DO get loaded right after an Icinga reload/restart of the main process. If you deploy a config again via de Director the default variables are in place again...

UPDATE 2: I tried a second setup similar to the production setup. In that setup I cannot reproduce the issue, so it might be a configuration issue. It seems like a faulty config messes things up pretty bad, but the challenge is how to find this config that triggers the issue. Like most setups there are a lot of hosts in this configuration. Any pointers on how to debug this?

UPDATE 3 & WORKAROUND: OK, I found the issue. It seems that after upgrading this issue occured and the workaround was to clone the affected host(s). After cloning and deploying the config the correct overrides are in place. So, it seems that it has something to do with the upgrades I did and probably database content related...

UPDATE 4: Unfortunately I thought I found the issue, but I was too soon. Cloning seems to help in some cases, but the unreliable override ignores are coming back again. Same behaviour as in issue 1580.

UPDATE 5 & WORKAROUND: OK, to have working work around (it is a pretty annoying issue for the engineers on pager duty) I downgraded the Icinga2 core to version 2.8.4-1. This resolved this issue, so it is related to the upgrade from version 2.8.4-1 to 2.9.1 of the Icinga2 core program in combination with the Director. Not sure now which causes this issue, the Director or the core. Since I didn't downgrade the Director from the master mentioned below, I think the issue might have to do with Icinga2 itself. Because this might be related to the Icinga2 core, I filed an issue report there as well with issue number: 6522.

Possible Solution

For a temporary workaround downgrade Icinga2 to version 2.8.4-1 (Debian release). Not a permanent solution of course.

Steps to Reproduce (for bugs)

1) Edit a service check from a host 2) Set an override of a variable 3) Deploy config 4) If you see the host details the override has not been applied

Your Environment

quentinsch commented 6 years ago

I had some conversation with @dnsmichi regarding this issue. Hopefully he mentioned some useful information that gives new clues to find a resolution for this issue. Since I tried to reproduce this in another setup it I couldn't manage to do so. This might indicate that it has something to do with the amount of checks and/or overrides? This is just a wild guess, but this information might help as well.

lazyfrosch commented 6 years ago

I guess the safest and easiest way for director would be to render all services as apply, but just bind those to a single host.

This is a problem for:

So far I don't see a downside apart from Icinga needing to resolve more apply rules, we can still ship the rules in a zone related config files for the appropriate host.

In addition this could be a chance to reduce the amount of service set config, in terms of merging individual assignments to: assign where ... || host.name in ["a", "b"]

pre problem

object Service "cust1-foo" {
    host_name = "cust1-bar"
    import "ping4"

    vars.test = "1"
}

/** Service Set 'test2' **/

object Service "test-set-test-2" {
    host_name = "cust2-foo"
    import "ping4"

    import DirectorOverrideTemplate
}

new config

apply Service "cust1-foo" {
    import "ping4"

    vars.test = "1"

    assign where host.name == "cust1-bar"
}

/** Service Set 'test2' **/

apply Service "test-set-test-2" {
    import "ping4"

    assign where host.name == "cust2-foo"
    import DirectorOverrideTemplate
}

template Service DirectorOverrideTemplate {
  if (vars) {
    vars += host.vars[DirectorOverrideVars][name]
  } else {
    vars = host.vars[DirectorOverrideVars][name]
  }
}
Thomas-Gelf commented 6 years ago

@dnsmichi: before we evaluate changing all Service Objects to Service Apply Rules I'd love to learn more about the expected performance impact. Modeled roughly according a "random" (not the biggest) real-world customer setup, given the following numbers:

150.000 Checks, configured as:

What's the estimated increase in startup time when we change those 20.000 Single Services to 20.000 Apply Rules? Do we have hard-coded tuning for host.name == "some-string" to get O(1) performance? Do we iterate 20.000 times over 10.000 hosts? Is it something in between?

Also, did anyone evaluate the cost of restoring the former behavior with fixing the problem involved in this breaking change in a different way?

Thanks a lot, Thomas

Thomas-Gelf commented 6 years ago

Thanks for your help, I did the test on my own.

Small setup, 5000 Hosts, 4 Single Services per Host. Flat config file, services have once been single objects, once the exact same services transformed into apply rules assigned where host.name == "<hostname>". In addition to that there was the default conf.d. I didn't bother to clean it up, so there was one additional Service per Host. Final result is the same, both scenarios showed the same net number of Hosts and Services.

Measured just the time for icinga2 daemon -C:

In that 1,5 minutes 8 cores have been 99% busy, workqueue told to be empty in -2147483648 days. Seems that my assumption was correct.

Did the same tests with 30.000 Hosts, still 4 Services per Host:

We're running larger setups.

lazyfrosch commented 6 years ago

Well our problem is not really the activation order, it is the commit order.

Commit had never been in order or in respect for object dependencies.

I guess the problem was raised by this change: https://github.com/Icinga/icinga2/commit/d9010c7b9faaec137f3e195b370edbb406c37d76

bildschirmfoto von 2018-08-09 11-11-30

Notes:

Later in that function OnConfigAllLoaded is ran in order of internal dependencies.

lippserd commented 6 years ago

Hi,

We discussed this today and agreed that it is best to make get_host() working again. Though we may not have designed those accessor functions for this use case in the first place, it used to work without any problems in the past. At least, we did not receive any issue reports until now. Switching the Director to use Apply Rules is not really an option because of the huge performance impact. In our discussion we evaluated some other solutions which we may introduce in the future. For now they are just here for reference:

  1. get_host(), get_service() and alike should reliably work in our DSL. This could be achieved by using futures.

  2. Improve the performance of Apply Rules by optimizing specific lookups, e.g. host.name == x.

Thanks to all parties to sort this out. Of course we try our best to make get_host() working again asap.

Cheers, Eric

Thomas-Gelf commented 6 years ago

Director will now warn in the Startup Log when get_host() fails, this has been implemented with #1595. @quentinsch: thanks a lot for your patience and help to track this down. I'll close this issue as there is not much more Director can do at the time being. Please follow the issue in the Icinga 2 issue tracker for a final fix. In the meantime 2.8.x should work fine for you.

Cheers, Thomas

quentinsch commented 6 years ago

Thanks a lot guys for sorting this out. Great work! @Thomas-Gelf, I already upgraded back to 2.9.x in combination with apply rules. In my case I moved away from Service Sets. Anyhow, when fixed it helps others when they hit this issue.

Al2Klimov commented 6 years ago

Flat config file, services have once been single objects, once the exact same services transformed into apply rules assigned where host.name == "<hostname>". (...)

– @Thomas-Gelf, https://github.com/Icinga/icingaweb2-module-director/issues/1579#issuecomment-411520638

Did you convert the services 1:1? If yes, you're absolutely right. On my MacBook Pro 5000x4 service objects take 7s and 5000x4 applys more than 3m (I've interrupted it, I'm not as patient as you).

However 5000 hosts and 4 applys (as already suggested by @lazyfrosch) take 15s with host.name in myHosts and (again) 7s with myHosts[host.name]. (You was right @Crunsher, there's no much diff.)

30000x10 – again, 1.5m vs 1.5m.

"And what's the advantage of all this?"

You'd have less data over the net, on the disk and... get_host() would purr like a cat! (3 "selling arguments", how could any customer say no?)

"Talking is cheap, show me teh code!"

Fine...

Objects generator

#!/usr/bin/perl

print "object CheckCommand \"silence\" {\n  command = [ \"/bin/true\" ]\n}\n";

for (my $i = 0; $i < $ARGV[0]; ++$i) {
    print "object Host \"dummy${i}\" {\n  check_command = \"silence\"\n  vars.ihascheezburger = true\n}\n";

    for (my $j = 0; $j < $ARGV[1]; ++$j) {
        print "object Service \"dummy${j}\" {\n  host_name = \"dummy${i}\"\n  check_command = \"silence\"\n  vars.ihascheezburger = true\n}\n"
    }
}

Applys generator

#!/usr/bin/perl

print "object CheckCommand \"silence\" {\n  command = [ \"/bin/true\" ]\n}\n";

for (my $i = 0; $i < $ARGV[0]; ++$i) {
    print "object Host \"dummy${i}\" {\n  check_command = \"silence\"\n  vars.ihascheezburger = true\n}\n";

    for (my $j = 0; $j < $ARGV[1]; ++$j) {
        print "apply Service \"dummy${j}\" {\n  assign where host.name == \"dummy${i}\"\n  check_command = \"silence\"\n  vars.ihascheezburger = true\n}\n"
    }
}

Bad-design applys compressor

#!/usr/bin/perl

my @hosts = ();

print "object CheckCommand \"silence\" {\n  command = [ \"/bin/true\" ]\n}\n";

for (my $i = 0; $i < $ARGV[0]; ++$i) {
    print "object Host \"dummy${i}\" {\n  check_command = \"silence\"\n  vars.ihascheezburger = true\n}\n";

    push @hosts, "\"dummy${i}\""
}

my $hostList = join ", ", @hosts;

print "var allTehHosts = [ $hostList ]\n";

for (my $j = 0; $j < $ARGV[1]; ++$j) {
    print "apply Service \"dummy${j}\" use(allTehHosts) {\n  assign where host.name in allTehHosts\n  check_command = \"silence\"\n  vars.ihascheezburger = true\n}\n"
}

Applys compressor

#!/usr/bin/perl

my @hosts = ();

print "object CheckCommand \"silence\" {\n  command = [ \"/bin/true\" ]\n}\n";

for (my $i = 0; $i < $ARGV[0]; ++$i) {
    print "object Host \"dummy${i}\" {\n  check_command = \"silence\"\n  vars.ihascheezburger = true\n}\n";

    push @hosts, "  \"dummy${i}\" = true\n"
}

my $hostList = join "", @hosts;

print "var allTehHosts = {\n${hostList}}\n";

for (my $j = 0; $j < $ARGV[1]; ++$j) {
    print "apply Service \"dummy${j}\" use(allTehHosts) {\n  assign where allTehHosts[host.name]\n  check_command = \"silence\"\n  vars.ihascheezburger = true\n}\n"
}

prefix/sbin/icinga2 daemon -C, 5000x4 services

[2018-08-29 11:45:42 +0200] information/cli: Icinga application loader (version: v2.9.1-159-g9fb4ffdef; debug)
[2018-08-29 11:45:42 +0200] information/cli: Loading configuration file(s).
[2018-08-29 11:45:43 +0200] information/ConfigItem: Committing config item(s).
[2018-08-29 11:45:46 +0200] information/ApiListener: My API identity: CENSORED
[2018-08-29 11:45:49 +0200] information/ConfigItem: Instantiated 1 FileLogger.
[2018-08-29 11:45:49 +0200] information/ConfigItem: Instantiated 215 CheckCommands.
[2018-08-29 11:45:49 +0200] information/ConfigItem: Instantiated 1 ApiUser.
[2018-08-29 11:45:49 +0200] information/ConfigItem: Instantiated 1 Downtime.
[2018-08-29 11:45:49 +0200] information/ConfigItem: Instantiated 4 Endpoints.
[2018-08-29 11:45:49 +0200] information/ConfigItem: Instantiated 5002 Hosts.
[2018-08-29 11:45:49 +0200] information/ConfigItem: Instantiated 2 HostGroups.
[2018-08-29 11:45:49 +0200] information/ConfigItem: Instantiated 1 IcingaApplication.
[2018-08-29 11:45:49 +0200] information/ConfigItem: Instantiated 12 Notifications.
[2018-08-29 11:45:49 +0200] information/ConfigItem: Instantiated 2 NotificationCommands.
[2018-08-29 11:45:49 +0200] information/ConfigItem: Instantiated 1 CheckerComponent.
[2018-08-29 11:45:49 +0200] information/ConfigItem: Instantiated 1 UserGroup.
[2018-08-29 11:45:49 +0200] information/ConfigItem: Instantiated 1 ScheduledDowntime.
[2018-08-29 11:45:49 +0200] information/ConfigItem: Instantiated 20011 Services.
[2018-08-29 11:45:49 +0200] information/ConfigItem: Instantiated 6 Zones.
[2018-08-29 11:45:49 +0200] information/ConfigItem: Instantiated 3 ServiceGroups.
[2018-08-29 11:45:49 +0200] information/ConfigItem: Instantiated 4 TimePeriods.
[2018-08-29 11:45:49 +0200] information/ConfigItem: Instantiated 2 Users.
[2018-08-29 11:45:49 +0200] information/ConfigItem: Instantiated 1 NotificationComponent.
[2018-08-29 11:45:49 +0200] information/ConfigItem: Instantiated 1 CENSORED.
[2018-08-29 11:45:49 +0200] information/ConfigItem: Instantiated 1 ApiListener.
[2018-08-29 11:45:49 +0200] information/ScriptGlobal: Dumping variables to file 'CENSORED/var/cache/icinga2/icinga2.vars'
[2018-08-29 11:45:49 +0200] information/cli: Finished validating the configuration file(s).

prefix/sbin/icinga2 daemon -C, 5000 hosts, 4 applys (bad design)

[2018-08-29 12:03:21 +0200] information/cli: Icinga application loader (version: v2.9.1-159-g9fb4ffdef; debug)
[2018-08-29 12:03:21 +0200] information/cli: Loading configuration file(s).
[2018-08-29 12:03:21 +0200] information/ConfigItem: Committing config item(s).
[2018-08-29 12:03:22 +0200] information/ApiListener: My API identity: CENSORED
[2018-08-29 12:03:31 +0200] information/WorkQueue: #4 (DaemonUtility::LoadConfigFiles) items: 0, rate: 3.28333/s (197/min 197/5min 197/15min);
[2018-08-29 12:03:32 +0200] information/WorkQueue: #7 (ApiListener, SyncQueue) items: 0, rate:  0/s (0/min 0/5min 0/15min);
[2018-08-29 12:03:32 +0200] information/WorkQueue: #5 (CENSORED) items: 0, rate:  0/s (0/min 0/5min 0/15min);
[2018-08-29 12:03:32 +0200] information/WorkQueue: #6 (ApiListener, RelayQueue) items: 0, rate:  0/s (0/min 0/5min 0/15min);
[2018-08-29 12:03:36 +0200] information/ConfigItem: Instantiated 1 FileLogger.
[2018-08-29 12:03:36 +0200] information/ConfigItem: Instantiated 215 CheckCommands.
[2018-08-29 12:03:36 +0200] information/ConfigItem: Instantiated 1 ApiUser.
[2018-08-29 12:03:36 +0200] information/ConfigItem: Instantiated 1 Downtime.
[2018-08-29 12:03:36 +0200] information/ConfigItem: Instantiated 4 Endpoints.
[2018-08-29 12:03:36 +0200] information/ConfigItem: Instantiated 5002 Hosts.
[2018-08-29 12:03:36 +0200] information/ConfigItem: Instantiated 2 HostGroups.
[2018-08-29 12:03:36 +0200] information/ConfigItem: Instantiated 1 IcingaApplication.
[2018-08-29 12:03:36 +0200] information/ConfigItem: Instantiated 12 Notifications.
[2018-08-29 12:03:36 +0200] information/ConfigItem: Instantiated 2 NotificationCommands.
[2018-08-29 12:03:36 +0200] information/ConfigItem: Instantiated 1 CheckerComponent.
[2018-08-29 12:03:36 +0200] information/ConfigItem: Instantiated 1 UserGroup.
[2018-08-29 12:03:36 +0200] information/ConfigItem: Instantiated 1 ScheduledDowntime.
[2018-08-29 12:03:36 +0200] information/ConfigItem: Instantiated 20011 Services.
[2018-08-29 12:03:36 +0200] information/ConfigItem: Instantiated 6 Zones.
[2018-08-29 12:03:36 +0200] information/ConfigItem: Instantiated 3 ServiceGroups.
[2018-08-29 12:03:36 +0200] information/ConfigItem: Instantiated 4 TimePeriods.
[2018-08-29 12:03:36 +0200] information/ConfigItem: Instantiated 2 Users.
[2018-08-29 12:03:36 +0200] information/ConfigItem: Instantiated 1 NotificationComponent.
[2018-08-29 12:03:36 +0200] information/ConfigItem: Instantiated 1 CENSORED.
[2018-08-29 12:03:36 +0200] information/ConfigItem: Instantiated 1 ApiListener.
[2018-08-29 12:03:36 +0200] information/ScriptGlobal: Dumping variables to file 'CENSORED/var/cache/icinga2/icinga2.vars'
[2018-08-29 12:03:36 +0200] information/cli: Finished validating the configuration file(s).

prefix/sbin/icinga2 daemon -C, 5000 hosts, 4 applys

[2018-08-29 12:06:34 +0200] information/cli: Icinga application loader (version: v2.9.1-159-g9fb4ffdef; debug)
[2018-08-29 12:06:34 +0200] information/cli: Loading configuration file(s).
[2018-08-29 12:06:34 +0200] information/ConfigItem: Committing config item(s).
[2018-08-29 12:06:35 +0200] information/ApiListener: My API identity: CENSORED
[2018-08-29 12:06:41 +0200] information/ConfigItem: Instantiated 1 FileLogger.
[2018-08-29 12:06:41 +0200] information/ConfigItem: Instantiated 215 CheckCommands.
[2018-08-29 12:06:41 +0200] information/ConfigItem: Instantiated 1 ApiUser.
[2018-08-29 12:06:41 +0200] information/ConfigItem: Instantiated 1 Downtime.
[2018-08-29 12:06:41 +0200] information/ConfigItem: Instantiated 4 Endpoints.
[2018-08-29 12:06:41 +0200] information/ConfigItem: Instantiated 5002 Hosts.
[2018-08-29 12:06:41 +0200] information/ConfigItem: Instantiated 2 HostGroups.
[2018-08-29 12:06:41 +0200] information/ConfigItem: Instantiated 1 IcingaApplication.
[2018-08-29 12:06:41 +0200] information/ConfigItem: Instantiated 12 Notifications.
[2018-08-29 12:06:41 +0200] information/ConfigItem: Instantiated 2 NotificationCommands.
[2018-08-29 12:06:41 +0200] information/ConfigItem: Instantiated 1 CheckerComponent.
[2018-08-29 12:06:41 +0200] information/ConfigItem: Instantiated 1 UserGroup.
[2018-08-29 12:06:41 +0200] information/ConfigItem: Instantiated 1 ScheduledDowntime.
[2018-08-29 12:06:41 +0200] information/ConfigItem: Instantiated 20011 Services.
[2018-08-29 12:06:41 +0200] information/ConfigItem: Instantiated 6 Zones.
[2018-08-29 12:06:41 +0200] information/ConfigItem: Instantiated 3 ServiceGroups.
[2018-08-29 12:06:41 +0200] information/ConfigItem: Instantiated 4 TimePeriods.
[2018-08-29 12:06:41 +0200] information/ConfigItem: Instantiated 2 Users.
[2018-08-29 12:06:41 +0200] information/ConfigItem: Instantiated 1 NotificationComponent.
[2018-08-29 12:06:41 +0200] information/ConfigItem: Instantiated 1 CENSORED.
[2018-08-29 12:06:41 +0200] information/ConfigItem: Instantiated 1 ApiListener.
[2018-08-29 12:06:41 +0200] information/ScriptGlobal: Dumping variables to file 'CENSORED/var/cache/icinga2/icinga2.vars'
[2018-08-29 12:06:41 +0200] information/cli: Finished validating the configuration file(s).

prefix/sbin/icinga2 daemon -C, 30000x10 services

[2018-08-29 12:07:56 +0200] information/cli: Icinga application loader (version: v2.9.1-159-g9fb4ffdef; debug)
[2018-08-29 12:07:56 +0200] information/cli: Loading configuration file(s).
[2018-08-29 12:08:10 +0200] information/ConfigItem: Committing config item(s).
[2018-08-29 12:08:20 +0200] information/WorkQueue: #4 (DaemonUtility::LoadConfigFiles) items: 0, rate:  0/s (0/min 0/5min 0/15min);
[2018-08-29 12:08:30 +0200] information/ApiListener: My API identity: alexanders-mbp.int.netways.de
[2018-08-29 12:08:40 +0200] information/WorkQueue: #6 (ApiListener, RelayQueue) items: 0, rate:  0/s (0/min 0/5min 0/15min);
[2018-08-29 12:08:40 +0200] information/WorkQueue: #5 (RedisWriter) items: 0, rate:  0/s (0/min 0/5min 0/15min);
[2018-08-29 12:08:40 +0200] information/WorkQueue: #7 (ApiListener, SyncQueue) items: 0, rate:  0/s (0/min 0/5min 0/15min);
[2018-08-29 12:09:31 +0200] information/ConfigItem: Instantiated 1 FileLogger.
[2018-08-29 12:09:31 +0200] information/ConfigItem: Instantiated 215 CheckCommands.
[2018-08-29 12:09:31 +0200] information/ConfigItem: Instantiated 1 ApiUser.
[2018-08-29 12:09:31 +0200] information/ConfigItem: Instantiated 1 Downtime.
[2018-08-29 12:09:31 +0200] information/ConfigItem: Instantiated 4 Endpoints.
[2018-08-29 12:09:31 +0200] information/ConfigItem: Instantiated 30002 Hosts.
[2018-08-29 12:09:31 +0200] information/ConfigItem: Instantiated 2 HostGroups.
[2018-08-29 12:09:31 +0200] information/ConfigItem: Instantiated 1 IcingaApplication.
[2018-08-29 12:09:31 +0200] information/ConfigItem: Instantiated 12 Notifications.
[2018-08-29 12:09:31 +0200] information/ConfigItem: Instantiated 2 NotificationCommands.
[2018-08-29 12:09:31 +0200] information/ConfigItem: Instantiated 1 CheckerComponent.
[2018-08-29 12:09:31 +0200] information/ConfigItem: Instantiated 1 UserGroup.
[2018-08-29 12:09:31 +0200] information/ConfigItem: Instantiated 1 ScheduledDowntime.
[2018-08-29 12:09:31 +0200] information/ConfigItem: Instantiated 300011 Services.
[2018-08-29 12:09:31 +0200] information/ConfigItem: Instantiated 6 Zones.
[2018-08-29 12:09:31 +0200] information/ConfigItem: Instantiated 3 ServiceGroups.
[2018-08-29 12:09:31 +0200] information/ConfigItem: Instantiated 4 TimePeriods.
[2018-08-29 12:09:31 +0200] information/ConfigItem: Instantiated 2 Users.
[2018-08-29 12:09:31 +0200] information/ConfigItem: Instantiated 1 NotificationComponent.
[2018-08-29 12:09:31 +0200] information/ConfigItem: Instantiated 1 RedisWriter.
[2018-08-29 12:09:31 +0200] information/ConfigItem: Instantiated 1 ApiListener.
[2018-08-29 12:09:32 +0200] information/ScriptGlobal: Dumping variables to file '/Users/aklimov/NET/WS/icinga2/prefix/var/cache/icinga2/icinga2.vars'
[2018-08-29 12:09:32 +0200] information/cli: Finished validating the configuration file(s).

prefix/sbin/icinga2 daemon -C, 30000 hosts, 10 applys

[2018-08-29 12:13:11 +0200] information/cli: Icinga application loader (version: v2.9.1-159-g9fb4ffdef; debug)
[2018-08-29 12:13:11 +0200] information/cli: Loading configuration file(s).
[2018-08-29 12:13:13 +0200] information/ConfigItem: Committing config item(s).
[2018-08-29 12:13:18 +0200] information/ApiListener: My API identity: alexanders-mbp.int.netways.de
[2018-08-29 12:13:23 +0200] information/WorkQueue: #4 (DaemonUtility::LoadConfigFiles) items: 4, rate: 3.2/s (192/min 192/5min 192/15min);
[2018-08-29 12:13:28 +0200] information/WorkQueue: #5 (RedisWriter) items: 0, rate:  0/s (0/min 0/5min 0/15min);
[2018-08-29 12:13:28 +0200] information/WorkQueue: #6 (ApiListener, RelayQueue) items: 0, rate:  0/s (0/min 0/5min 0/15min);
[2018-08-29 12:13:28 +0200] information/WorkQueue: #7 (ApiListener, SyncQueue) items: 0, rate:  0/s (0/min 0/5min 0/15min);
[2018-08-29 12:14:41 +0200] information/ConfigItem: Instantiated 1 FileLogger.
[2018-08-29 12:14:41 +0200] information/ConfigItem: Instantiated 215 CheckCommands.
[2018-08-29 12:14:41 +0200] information/ConfigItem: Instantiated 1 ApiUser.
[2018-08-29 12:14:41 +0200] information/ConfigItem: Instantiated 1 Downtime.
[2018-08-29 12:14:41 +0200] information/ConfigItem: Instantiated 4 Endpoints.
[2018-08-29 12:14:41 +0200] information/ConfigItem: Instantiated 30002 Hosts.
[2018-08-29 12:14:41 +0200] information/ConfigItem: Instantiated 2 HostGroups.
[2018-08-29 12:14:41 +0200] information/ConfigItem: Instantiated 1 IcingaApplication.
[2018-08-29 12:14:41 +0200] information/ConfigItem: Instantiated 12 Notifications.
[2018-08-29 12:14:41 +0200] information/ConfigItem: Instantiated 2 NotificationCommands.
[2018-08-29 12:14:41 +0200] information/ConfigItem: Instantiated 1 CheckerComponent.
[2018-08-29 12:14:41 +0200] information/ConfigItem: Instantiated 1 UserGroup.
[2018-08-29 12:14:41 +0200] information/ConfigItem: Instantiated 1 ScheduledDowntime.
[2018-08-29 12:14:41 +0200] information/ConfigItem: Instantiated 300011 Services.
[2018-08-29 12:14:41 +0200] information/ConfigItem: Instantiated 6 Zones.
[2018-08-29 12:14:41 +0200] information/ConfigItem: Instantiated 3 ServiceGroups.
[2018-08-29 12:14:41 +0200] information/ConfigItem: Instantiated 4 TimePeriods.
[2018-08-29 12:14:41 +0200] information/ConfigItem: Instantiated 2 Users.
[2018-08-29 12:14:41 +0200] information/ConfigItem: Instantiated 1 NotificationComponent.
[2018-08-29 12:14:41 +0200] information/ConfigItem: Instantiated 1 RedisWriter.
[2018-08-29 12:14:41 +0200] information/ConfigItem: Instantiated 1 ApiListener.
[2018-08-29 12:14:41 +0200] information/ScriptGlobal: Dumping variables to file '/Users/aklimov/NET/WS/icinga2/prefix/var/cache/icinga2/icinga2.vars'
[2018-08-29 12:14:41 +0200] information/cli: Finished validating the configuration file(s).
Thomas-Gelf commented 6 years ago

Please read above scenario description, that's what we have been talking about. Suggested "solution" was to render every single service deployed by the Director as an Apply Rule per default. I opposed this as of obvious performance reasons. Sources for single Service or single Service Set assignments to single hosts are usually external, like configuration management tools and CMDBs.

There will be better ways of designing a configuration for every scenario, but that's not the point. Transforming every rendered configuration out in the wild into your "bad design" example per default as a workaround for the problem that Icinga 2.9 broke the last possibility of defining Service objects based on properties of the related Host wouldn't be a good advise. Please see #6534 for simple related configuration examples.

Al2Klimov commented 6 years ago
  1. I think I haven't noted that my above example wasn't 1.5m vs 1.5m, but 1m38s vs 1.5m.
  2. I'm wondering why these management tools don't just set custom vars. IMO much easier – and definitively a good advice.
  3. I'm afraid you misunderstood me, I pointed out four scenarios (flat objects, flat applies, the @lazyfrosch way aka "bad design" and my extension aka "[not] bad design") and my suggestion was to use the last one (which is less diskspace and nettransfer wasting than the first one and is a bit faster – 1m38s vs 1.5m – than the first one) in the director, not in any external tool.
  4. I'm not quite sure whether it was "the last one", there's my fourth scenario, too.

And the most important thing... (please read on)

@bobapple just told be that @gunnarbeutner have guaranteed you that the feature we broke in v2.9 shall work. IMO @gunnarbeutner missed to communicate that (internally). If he had communicated that, we had not such heated discussions (now). All/most of us had the opinion that you build that DIrector feature "blinkered", but instead you seem to have clearly communicated this with at least one of us. Well done and sorry for my thumbs down (I didn't know better either, CC @dnsmichi).

However this opinion of mine still didn't change. Yes, it's not your (Director's) fault at all, but the management tools'. But IMO it doesn't matter at all whether my config tool is VIm or CMDBHasteNichtGesehen – both should produce "politically correct" (let me know whether there's a better term for that) config – and that's not that difficult (custom vars...). All of us I2 devs + BE have communicated the I2 config architecture clearly enough. Actively supporting "non politically correct" config opens barrels w/o grounds... like this one.

Thomas-Gelf commented 6 years ago

-> 2. no doubt. Just, we cannot forbid people to generate a few thousand single services in addition to many thousands generated by a few apply rules. Above scenario shows numbers modeled after a real world example -> 3. would be possible for equal services. We could introduce checksums for faster de-duplication. But as soon as someone syncs extra flags for reference (like vars.cmdb_asset_id = "DISK02343") we're screwed again.

For the rest: IMO we shouldn't assume that we can break features unless someone explicitly declared to use them. There is no piece of documentation stating "you should use get_host() only at runtime but not at parse time", so why should someone expect it to stop working when it used to work as expected?

The original problem we're trying to solve with get_host(host_name) is that there is no host property available in the service Object, while it exists in the Apply Rule. This has technical reasons, but is hard to explain to the average user. After finding the get_host(host_name) workaround with the help of @gunnarbeutner, I not only used this feature for the Director but also proposed it as a solution for users facing similar issues. Have a look at the linked issue, this is a perfectly valid use-case.

Single Services are not against the principles of the I2 architecture, both Objects and Apply Rules are a thing, both have their use-cases. I do not see why above example (150.000 Checks, configured as: 10.000 Hosts, 20.000 Single Services plus 120.000 Services, generated by 180 Apply-Rules) shouldn't be considered "politically correct". More than 85% of those services are in fact generated by a few Apply Rules, and I'm sure there is a reason why there are those 14% Single Services.

Al2Klimov commented 6 years ago

2: Of course, we cannot. If one wants to jump in front of a train, they can do it. But they have to deal with the consequences – same thing here.

3: Not really, you just have to fill not just true into my suggested dictionary, but the properties which differ (or all of them):

var allTehHosts = {
  "lolcatz' DB server" = {
    "vars" = {
      "ihascheezburger" = true
    }
  }
}

You're "sure there is a reason why there are those 14% Single Services"... so I assume you don't know it, right? If/once you know, please share it with me. Right now I can't imagine any use case for even one flat service object with actual advantages over an apply rule. IMO the management becomes a lot easier once you manage everything over custom vars (and it makes it reusable of course).

Thomas-Gelf commented 6 years ago

Those 14% are there because many teams in that company have the possibility to deploy custom checks by adding them to Puppet via project- and team-specific YAML files checked into their GIT for Hiera. Others:

There is absolutely nothing wrong with Single Services. They are faster, they cause less overhead in the IDO, they are easier to understand for the user. It's much simpler to just click "Add Service" instead of tweaking an existing apply rule or modeling a custom dictionary following specific rules to fit an existing Apply Rule.

Apply Rules are a nice feature when it goes to rolling out many mostly identical Services. Constructs with apply for may also have their use-case, but they are mostly a "copy service properties into a host var" technique. You have to be very careful when dealing with them. A typo in your Dictionary key? The service will silently be lost when no Rule matches. A type in your host_name in a Service Object? Fails immediately.

We are off topic, I'll stop here.