Icinga / icingaweb2-module-director

The Director aims to be your new favourite Icinga config deployment tool. Director is designed for those who want to automate their configuration deployment and those who want to grant their “point & click” users easy access to the configuration.
https://icinga.com/docs/director/latest
GNU General Public License v2.0
413 stars 203 forks source link

Scaling issues with assigned Hostgroups #1251

Open lazyfrosch opened 7 years ago

lazyfrosch commented 7 years ago

Host objects

7390 objects have been defined, 654 of them are templates, 1851 related group objects have been created

Adding assign_filter to one single hostgroup, which would match < 10 hosts.

Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 632 bytes) in /usr/share/icingaweb2/library/vendor/Zend/Db/Statement/Pdo.php on line 219

~When going for preview of that group: (other error)~

Removing assign_filter will lead to re-calculation, even if the filter is empty now.

Fatal error: Maximum execution time of 30 seconds exceeded in /usr/share/icingaweb2/library/Icinga/Data/Filter/FilterQueryString.php on line 226

Also related is #1250 (Change applied here)

lazyfrosch commented 7 years ago

Still in investigation.

Things also to note:

lazyfrosch commented 7 years ago

When changing one of that hosts that would have matched, the group instantly got applied.

icinga_host_resolved_var still empty

Thomas-Gelf commented 7 years ago

Don't worry, *_resolved_var has been postponed and is not in use yet. Recalculating filters is tricky. When removing the assign_filter from a single Hostgroup, recalculation must be triggered as it could have formerly have matched a bunch of hosts.

My local test environment has 2000 hosts, about 50 Hostgroups with apply rules generating more than 10.000 mappings. Having a Host/Hostgroup relation of 3:1 is ... special. Probably an indication of Hostgroups (mis-)used for a special purpose, like permissions, combinations of properties, whatever. But even if this looks unusual to me, I'm pretty sure there is a valid reason for this.

There is a lot of room for tuning with various tricks when it comes to the membership resolver. Still, I wouldn't have released it if in doubt about it's performance. To give me a better understanding of your setup, could you please let me know what this query is telling you:

SELECT COUNT(DISTINCT assign_filter) FROM icinga_hostgroup;

Thanks, Thomas

Thomas-Gelf commented 7 years ago

NB: while the first and last error above could eventually be a consequence of your applied patch, the one in the middle is quite strange. There is definitively something wrong with the type-casting in that class, but I do not understand where and why this would affect the config preview of a Group. Could you please let me have your exact GIT ref and all eventually applied patches?

lazyfrosch commented 7 years ago

Yeah its one of the largest environments I guess, Hostgroups are used in combination of Applications and Stages.

Example:

ICINGA -> host.vars.application == "ICINGA"
ICINGA_PROD -> host.vars.application == "ICINGA" && host.vars.application == "PROD"

But for now, I'm just experimenting with all of hosts, about 200 hostgroups with a filter.

These have been set during sync, and re-calculation worked in sync mode, but the sync took ~107 seconds (sync of 200 hostgroups with assign_filter).

Problems might also come from lots of templates present. Templates are currently used for application grouping (and zone assignment). application var is set on the template level.

Structure:

I'm still thinking about how to track down the issue.

We should definitely talk on Monday, and I can show you the environment so far.

lazyfrosch commented 7 years ago

While the failed-to-render.conf error was caused by the GroupMembershipResolver, it's only appearing in legacy mode of Director.

I'm working on fixing that, but different issue from here...

lazyfrosch commented 7 years ago

Had a look on what happens during re-calcuation of a hostgroup in this environment (after changing a hostgroup)

objectapplymatches

You see lots of database queries here that should be prefetched.

Thomas-Gelf commented 5 years ago

@lazyfrosch: has this been addressed, or is this still an issue?

nilmerg commented 3 years ago

ref/IP/33409