Open lazyfrosch opened 7 years ago
Still in investigation.
Things also to note:
icinga_host_resolved_var
is empty, not sure when that would be populatedWhen changing one of that hosts that would have matched, the group instantly got applied.
icinga_host_resolved_var
still empty
Don't worry, *_resolved_var
has been postponed and is not in use yet. Recalculating filters is tricky. When removing the assign_filter from a single Hostgroup, recalculation must be triggered as it could have formerly have matched a bunch of hosts.
My local test environment has 2000 hosts, about 50 Hostgroups with apply rules generating more than 10.000 mappings. Having a Host/Hostgroup relation of 3:1 is ... special. Probably an indication of Hostgroups (mis-)used for a special purpose, like permissions, combinations of properties, whatever. But even if this looks unusual to me, I'm pretty sure there is a valid reason for this.
There is a lot of room for tuning with various tricks when it comes to the membership resolver. Still, I wouldn't have released it if in doubt about it's performance. To give me a better understanding of your setup, could you please let me know what this query is telling you:
SELECT COUNT(DISTINCT assign_filter) FROM icinga_hostgroup;
Thanks, Thomas
NB: while the first and last error above could eventually be a consequence of your applied patch, the one in the middle is quite strange. There is definitively something wrong with the type-casting in that class, but I do not understand where and why this would affect the config preview of a Group. Could you please let me have your exact GIT ref and all eventually applied patches?
Yeah its one of the largest environments I guess, Hostgroups are used in combination of Applications and Stages.
Example:
ICINGA -> host.vars.application == "ICINGA"
ICINGA_PROD -> host.vars.application == "ICINGA" && host.vars.application == "PROD"
But for now, I'm just experimenting with all of hosts, about 200 hostgroups with a filter.
These have been set during sync, and re-calculation worked in sync mode, but the sync took ~107 seconds (sync of 200 hostgroups with assign_filter).
Problems might also come from lots of templates present. Templates are currently used for application grouping (and zone assignment). application
var is set on the template level.
Structure:
I'm still thinking about how to track down the issue.
We should definitely talk on Monday, and I can show you the environment so far.
While the failed-to-render.conf
error was caused by the GroupMembershipResolver
, it's only appearing in legacy mode of Director.
I'm working on fixing that, but different issue from here...
Had a look on what happens during re-calcuation of a hostgroup in this environment (after changing a hostgroup)
You see lots of database queries here that should be prefetched.
@lazyfrosch: has this been addressed, or is this still an issue?
ref/IP/33409
Host objects
7390 objects have been defined, 654 of them are templates, 1851 related group objects have been created
Adding assign_filter to one single hostgroup, which would match < 10 hosts.
~When going for preview of that group: (other error)~
Removing assign_filter will lead to re-calculation, even if the filter is empty now.
Also related is #1250 (Change applied here)