Icinga / icinga2

The core of our monitoring platform with a powerful configuration language and REST API.
https://icinga.com/docs/icinga2/latest
GNU General Public License v2.0
2.01k stars 577 forks source link

Evaluate zone in zone inclusion for zones.d config sync #7530

Closed dnsmichi closed 5 years ago

dnsmichi commented 5 years ago

General

Since this issue is linked inside the Director with telling you to update the configuration accordingly while a possible fix is evaluated, here's a quick solution in order to make this work again. The following restores the Zone/Endpoint objects as config objects outside of zones.d in your master/satellite's zones.conf with rendering them as external objects in the Director. This is the recommended way of setting up distributed environments according to the docs.

Example for a 3 level setup with the masters and satellites knowing about the zone hierarchy outside defined in zones.conf:

object Endpoint "icinga-master1.localdomain" {
  //define 'host' attribute to control the connection direction on each instance
}

object Endpoint "icinga-master2.localdomain" {
  //...
}

object Endpoint "icinga-satellite1.localdomain" {
  //...
}

object Endpoint "icinga-satellite2.localdomain" {
  //...
}

/***************/
// Zone hierarchy with endpoints, required for the trust relationship and that the cluster config sync knows which zone directory defined in zones.d needs to be synced to which endpoint.
// That's no different to what is explained in the docs as basic zone trust hierarchy, and is intentionally managed outside in zones.conf there.

object Zone "master" {
  endpoints = [ "icinga-master1.localdomain", "icinga-master2.localdomain" ] 
}

object Zone "satellite" {
  endpoints = [ "icinga-satellite1.localdomain", "icinga-satellite2.localdomain" ]
  parent = "master" // trust
}

Prepare the above configuration on all affected nodes, satellites are likely uptodate already. Then continue with the steps below.

  • backup your database, just to be on the safe side
  • create all non-external Zone/Endpoint-Objects on all related Icinga Master/Satellite-Nodes (manually in your local zones.conf)
  • while doing so please do NOT restart Icinga, no deployments
  • change the type in the Director DB:
UPDATE icinga_zone SET object_type = 'external_object' WHERE object_type = 'object';
UPDATE icinga_endpoint SET object_type = 'external_object' WHERE object_type = 'object';
  • render and deploy a new configuration in the Director. It will state that there are no changes. Ignore it, deploy anyways

That's it. All nodes should automatically restart, triggered by the deployed configuration via cluster protocol.

Problem

2.11 hardens the config sync thus only including directories in zones.d which have been configured outside in zones.conf.

When the following tree exists, the satellite zone is not included anymore:

cat zones.d/master/zones-satellite.conf

object Zone "satellite" { ... }

cat zones.d/satellite/agent-host.conf

object Host "agent" { ... }

The host object agent is not synced to the satellite zone, as the zone is not included outside in zones.conf.

While this implementation wasn't supported for 5 years now, it somehow worked because we didn't care about the directory content in zones.d. With making the config sync more robust, this is more strict with registered zone objects.

Describe the solution you'd like

Evaluate a way of allowing the Zone object configuration in zones.d, and deferring the actual object inclusion after all directories have been loaded. This needs a different algorithm with unregistering the config items from unknown zones before actually activating them.

Describe alternatives you've considered

Reverting #7120 which is not an option here.

Additional context

7519

https://icinga.com/docs/icinga2/latest/doc/16-upgrading-icinga-2/#config-sync-zones-in-zones

dnsmichi commented 5 years ago

I'll sum up the things I've tried thus far. That being said, I am taking a break with other projects, 2.11 was like "servus, burnout".

Workflow

The config compiler collects all files from disk, and compiles the configuration content into actual config items. Such an item can be an object, a template or another expression, e.g. an apply rule.

These config items are then registered, happening in ConfigItem::Register(). This is the place where we may collect an object when its m_Zone String attribute is specified, and store this globally in order to know about the configured zones.

Note: This is still a problem, since the configitem is not validated yet. It may break and then the other included objects from zones.d are registered, but actually referring to a broken zone.

Delete config items where we don't have collected a zone name

This is tricky. ConfigItem::CommitNewItems holds the steps, and we really need to unregister all config items before they are committed into actual config objects.

Interfaces

Not only the config compiler from static files is affected, also other layers just as the runtime API portions.

$ grep -r CommitItems lib/
lib//config/configitem.cpp:bool ConfigItem::CommitItems(const ActivationContext::Ptr& context, WorkQueue& upq, std::vector<ConfigItem::Ptr>& newItems, bool silent)
lib//config/configitem.cpp: if (!CommitItems(scope.GetContext(), upq, newItems, true))
lib//config/configitem.hpp: static bool CommitItems(const ActivationContext::Ptr& context, WorkQueue& upq, std::vector<ConfigItem::Ptr>& newItems, bool silent = false);
lib//cli/daemonutility.cpp: bool result = ConfigItem::CommitItems(ascope.GetContext(), upq, newItems);
lib//remote/configobjectutility.cpp:        if (!ConfigItem::CommitItems(ascope.GetContext(), upq, newItems, true)) {

ConfigItem::CommitItems() is the interface function under which the implementation must take place.

lib/config//vmops.hpp:      item.Compile()->Register();

also registers the object. This originates from the config item builder and the expression created by the config parser.

                $$ = new ObjectExpression(abstract, std::unique_ptr<Expression>($3), std::unique_ptr<Expression>($4),
                        std::move(filter), context->GetZone(), context->GetPackage(), std::move(*$5), $6, $7,
                        std::unique_ptr<Expression>($9), DebugInfoRange(@2, @7));
                delete $5;

Previous Attempt

The previous fix was to just ignore directories inside the include collection functions where no Zone config item existed. This fix isn't the best in this region, but it doesn't need to go into the "load everything into memory and then decide" route. With many zones.d directories actually containing faulty content, reverting the patch also decreases performance with loading the configuration.

That being said, in order to restore the old behavior, and still only include known directories from Zone config objects, we need to keep performance in mind.

New attempt

The above sounds easy, actually it isn't within the fragile parts of the config compiler.

The problems

I don't know where to put the lines with unregistering config items which are not in the globally collected zones list.

Things done thus far

Collecting the zones

diff --git a/lib/config/configitem.hpp b/lib/config/configitem.hpp
index 0b7ee86a3..79e4356d9 100644
--- a/lib/config/configitem.hpp
+++ b/lib/config/configitem.hpp
@@ -92,6 +92,7 @@ private:

        typedef std::vector<String> IgnoredItemList;
        static IgnoredItemList m_IgnoredItems;
+       static ItemMap m_RegisteredZoneItems;

        static ConfigItem::Ptr GetObjectUnlocked(const String& type,
                const String& name);

diff --git a/lib/config/configitem.cpp b/lib/config/configitem.cpp
index 5c9f0dd55..212826f87 100644
--- a/lib/config/configitem.cpp
+++ b/lib/config/configitem.cpp
@@ -33,6 +33,7 @@ ConfigItem::TypeMap ConfigItem::m_Items;
 ConfigItem::TypeMap ConfigItem::m_DefaultTemplates;
 ConfigItem::ItemList ConfigItem::m_UnnamedItems;
 ConfigItem::IgnoredItemList ConfigItem::m_IgnoredItems;
+ConfigItem::ItemMap ConfigItem::m_RegisteredZoneItems;

 REGISTER_FUNCTION(Internal, run_with_activation_context, &ConfigItem::RunWithActivationContext, "func");

@@ -341,6 +342,22 @@ void ConfigItem::Register()
                if (m_DefaultTmpl)
                        m_DefaultTemplates[m_Type][m_Name] = this;
        }
+
+       /*
+        * Register zone objects globally
+        * in order to drop unknown zones
+        * later. Don't do this in the constructor
+        * as there is no guarantee that this is used.
+        */
+       if (!m_Zone.IsEmpty()) {
+#ifdef I2_DEBUG
+               Log(LogDebug, "ConfigItem")
+                       << "Config item '" << m_Name << "' << with zone '" << m_Zone << "'.";
+#endif /* I2_DEBUG */
+
+               if (m_Type == Type::GetByName("Zone"))
+                       m_RegisteredZoneItems[m_Zone] = this;
+       }
 }

Ugly with hardcoding the Type name, but I don't see another way.

Unregister items which are not registered in the zones

After all items have been committed and OnConfigLoaded() was called already. I don't know which implications this has at this point, e.g. with triggering signals for features and what not.

@@ -602,6 +618,39 @@ bool ConfigItem::CommitItems(const ActivationContext::Ptr& context, WorkQueue& u
                return false;
        }

+#ifdef I2_DEBUG
+       String zones;
+       for (auto kv : m_RegisteredZoneItems) {
+               zones += "Zone name : " + kv.first + " ";
+       }
+
+       Log(LogCritical, "ConfigItem")
+               << "New items: " << newItems.size() << " Registered zone names: " << zones;
+#endif /* I2_DEBUG */
+
+       for (auto item : newItems) {
+               if (item->m_Zone.IsEmpty())
+                       continue;
+
+               /* Only register items as objects where we have
+                * collected a zone item before.
+                * Zone items may be defined in zones.d again,
+                * and "zone inception" means that we must
+                * first collect everything, and now ignore
+                * unwanted items. Do this before Commit().
+                *
+                * This may slow down the config compiler.
+                * Rationale: https://github.com/Icinga/icinga2/issues/7530
+                */
+               if (m_RegisteredZoneItems.find(item->m_Zone) == m_RegisteredZoneItems.end()) {
+#ifdef I2_DEBUG
+                       Log(LogCritical, "ConfigItem")
+                               << "Ignoring config item '" << item->m_Name << "'. Zone '" << item->m_Zone << "' is not configured.";
+#endif /* I2_DEBUG */
+                       item->Unregister();
+               }
+       }
+
        ApplyRule::CheckMatches(silent);

        if (!silent) {

Doesn't work

If put after having committed everything, and then unregistering the items, it doesn't work as for some reason, the global zone collection doesn't work atm.

[2019-09-25 09:55:08 +0200] critical/ConfigItem: New items: 0 Registered zone names:
[2019-09-25 09:55:08 +0200] critical/ConfigItem: New items: 0 Registered zone names:
[2019-09-25 09:55:08 +0200] critical/ConfigItem: New items: 1 Registered zone names:
[2019-09-25 09:55:08 +0200] information/cli: Icinga application loader (version: v2.11.0-13-g7ff26b680; debug)
[2019-09-25 09:55:08 +0200] information/cli: Loading configuration file(s).
[2019-09-25 09:55:08 +0200] information/ConfigItem: Committing config item(s).
[2019-09-25 09:55:08 +0200] information/ApiListener: My API identity: mbpmif.int.netways.de
[2019-09-25 09:55:09 +0200] critical/ConfigItem: New items: 294 Registered zone names:
[2019-09-25 09:55:09 +0200] critical/ConfigItem: Ignoring config item 'custom-check'. Zone 'global-templates' is not configured.

Apparently, global-templates is configured as by default.

Conclusio

For reasons I currently don't understand, ConfigItem::Register() is not called for all objects. Neither am I convinced that the unregister-after-or-before-commit really works in this regard, and when it is implemented, it doesn't break other things inside the config compiler. Highly likely we need to rewrite large parts of the config compiler logic to allow these specific exceptions.

Code is located in feature/zones-in-zones-inclusion.

dnsmichi commented 5 years ago

@Al2Klimov has implemented a different algorithm which requires additional test rounds whether this really re-implements the old behaviour while keeping the requested fix for 2.11 intact. The latter caused problems for 4+ years now, so still not a good candidate for a revert. Also, possible impacts on the speed of the config compiler need to be taken into account.

Last but not least, the zone loading needs to be tested on both, a config master, and all involved secondary master, satellite and agent instances.

neubi4 commented 5 years ago

If you can provide snapshot packages for centos 7, we are ready to test them.

Baboon92 commented 5 years ago

ref/NC/640333

dnsmichi commented 5 years ago

@neubi4 not yet, before going in here there's some other issues to resolve with the config sync. Current plans are to evaluate this in CW44 again.

dnsmichi commented 5 years ago

Resources blocked by