The page /monitoring/list/servicegroups can be very slow, to the point of not responding, due to missing indexes
To Reproduce
Using:
~1700 hosts
~24k services
21 service groups, with services from Service Sets
I narrowed the slowness in the following sub query :
SELECT sg.alias COLLATE latin1_general_ci AS servicegroup_alias,
sgo.name1 AS servicegroup_name,
CASE
WHEN ( ss.problem_has_been_acknowledged
+ ss.scheduled_downtime_depth
+ COALESCE(hs.current_state, 0) ) > 0 THEN 1
ELSE 0
end AS service_handled,
CASE
WHEN ss.current_state = 0 THEN
CASE WHEN ss.has_been_checked = 0 OR ss.has_been_checked IS
NULL THEN
16 ELSE
0
end + CASE WHEN ss.problem_has_been_acknowledged = 1 THEN 2
ELSE CASE
WHEN
ss.scheduled_downtime_depth > 0
THEN 1 ELSE 4 end end
ELSE CASE WHEN ss.has_been_checked = 0 OR ss.has_been_checked
IS NULL
THEN 16
WHEN
ss.current_state = 1 THEN 32 WHEN ss.current_state = 2
THEN 128
WHEN
ss.current_state =
3 THEN 64 ELSE 256 end + CASE WHEN hs.current_state > 0
THEN 1024
ELSE
CASE WHEN ss.problem_has_been_acknowledged = 1 THEN 512
ELSE CASE
WHEN
ss.scheduled_downtime_depth > 0 THEN 256 ELSE 2048 end end
end
end AS service_severity,
CASE
WHEN ss.has_been_checked = 0
OR ss.has_been_checked IS NULL THEN 99
ELSE ss.current_state
end AS service_state
FROM icinga_objects AS sgo
INNER JOIN icinga_servicegroups AS sg
ON sg.servicegroup_object_id = sgo.object_id
AND sgo.objecttype_id = 4
AND sgo.is_active = 1
INNER JOIN icinga_servicegroup_members AS sgm
ON sgm.servicegroup_id = sg.servicegroup_id
INNER JOIN icinga_objects AS so
ON so.object_id = sgm.service_object_id
AND so.objecttype_id = 2
AND so.is_active = 1
INNER JOIN icinga_services AS s
ON s.service_object_id = so.object_id
INNER JOIN icinga_hoststatus AS hs
ON hs.host_object_id = s.host_object_id
INNER JOIN icinga_servicestatus AS ss
ON ss.service_object_id = so.object_id
GROUP BY sgo.object_id,
sg.servicegroup_id,
ss.servicestatus_id,
hs.hoststatus_id
UNION ALL
SELECT sg.alias COLLATE latin1_general_ci AS servicegroup_alias,
sgo.name1 AS servicegroup_name,
NULL AS service_handled,
0 AS service_severity,
NULL AS service_state
FROM icinga_objects AS sgo
INNER JOIN icinga_servicegroups AS sg
ON sg.servicegroup_object_id = sgo.object_id
AND sgo.objecttype_id = 4
AND sgo.is_active = 1
LEFT JOIN icinga_servicegroup_members AS esgm
ON esgm.servicegroup_id = sg.servicegroup_id
GROUP BY sgo.object_id,
sg.servicegroup_id
HAVING ( COUNT(esgm.servicegroup_member_id) = 0 )
(prettyfied for readability)
An explain of the query before:
+------+-------------+-------+--------+----------------------------------------------------------+------------------------------+---------+---------------------------------------------------------+------+----------+-------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+------+-------------+-------+--------+----------------------------------------------------------+------------------------------+---------+---------------------------------------------------------+------+----------+-------------------------------------------------+
| 1 | PRIMARY | hs | ALL | object_id | NULL | NULL | NULL | 3319 | 100.00 | Using where; Using temporary; Using filesort |
| 1 | PRIMARY | sg | ALL | PRIMARY | NULL | NULL | NULL | 32 | 100.00 | Using where; Using join buffer (flat, BNL join) |
| 1 | PRIMARY | sgo | eq_ref | PRIMARY,objecttype_id,sla_idx_obj | PRIMARY | 8 | icinga2.sg.servicegroup_object_id | 1 | 0.04 | Using where |
| 1 | PRIMARY | sgm | ref | sgmbrs_sgid_soid,idx_servicegroup_members_object_id | sgmbrs_sgid_soid | 9 | icinga2.sg.servicegroup_id | 374 | 100.00 | Using where; Using index |
| 1 | PRIMARY | so | eq_ref | PRIMARY,objecttype_id,sla_idx_obj | PRIMARY | 8 | icinga2.sgm.service_object_id | 1 | 50.00 | Using where |
| 1 | PRIMARY | ss | eq_ref | object_id | object_id | 9 | icinga2.sgm.service_object_id | 1 | 100.00 | |
| 1 | PRIMARY | s | ref | services_host_object_id_idx,services_combined_object_idx | services_combined_object_idx | 18 | icinga2.sgm.service_object_id,icinga2.hs.host_object_id | 1 | 100.00 | Using index |
| 2 | UNION | sgo | ref | PRIMARY,objecttype_id,sla_idx_obj | sla_idx_obj | 12 | const,const | 21 | 100.00 | Using index; Using temporary; Using filesort |
| 2 | UNION | sg | ALL | NULL | NULL | NULL | NULL | 32 | 100.00 | Using where; Using join buffer (flat, BNL join) |
| 2 | UNION | esgm | ref | sgmbrs_sgid_soid | sgmbrs_sgid_soid | 9 | icinga2.sg.servicegroup_id | 374 | 100.00 | Using index |
+------+-------------+-------+--------+----------------------------------------------------------+------------------------------+---------+---------------------------------------------------------+------+----------+-------------------------------------------------+
Expected behavior
The query should be quick.
Screenshots
MariaDB [icinga2]> SELECT sg.alias COLLATE latin1_general_ci AS servicegroup_alias, sgo.name1 AS servicegroup_name, CASE WHEN ( ss.problem_has_been_acknowledged + ss.scheduled_downtime_depth
+ COALESCE(hs.current_state, 0) ) > 0 THEN 1 ELSE 0 end AS service_handled, CASE WHEN ss.current_state = 0 THEN CASE WHEN ss.has_been_checked = 0 OR ss.has_been_checked IS
NULL THEN 16 ELSE 0 end + CASE WHEN ss.problem_has_been_acknowledged = 1 THEN 2 ELSE CASE WHEN ss.scheduled_downtime_depth > 0 THEN 1 ELSE 4 end end ELSE CASE WHEN ss.has_been_checked = 0 OR ss.has_been_checked IS NULL THEN 16 WHEN ss.current_state = 1 THEN 32 WHEN ss.current_state = 2 THEN 128 WHEN ss.current_state = 3 THEN 64 ELSE 256 end + CASE WHEN hs.current_state > 0 THEN 1024 ELSE CASE WHEN ss.problem_has_been_acknowledged = 1 THEN 512 ELSE CASE WHEN
ss.scheduled_downtime_depth > 0 THEN 256 ELSE 2048 end end end end AS service_severity, CASE WHEN ss.has_been_checked = 0 OR ss.has_been_checked IS NULL THEN 99
ELSE ss.current_state end AS service_state FROM icinga_objects AS sgo INNER JOIN icinga_servicegroups AS sg ON sg.servicegroup_object_id = sgo.object_id AND sgo.objecttype_id = 4
AND sgo.is_active = 1 INNER JOIN icinga_servicegroup_members AS sgm ON sgm.servicegroup_id = sg.servicegroup_id INNER JOIN icinga_objects AS so ON so.object_id = sgm.service_object_id
AND so.objecttype_id = 2 AND so.is_active = 1 INNER JOIN icinga_services AS s ON s.service_object_id = so.object_id INNER JOIN icinga_hoststatus AS hs ON hs.host_object_id = s.host_object_id
INNER JOIN icinga_servicestatus AS ss ON ss.service_object_id = so.object_id GROUP BY sgo.object_id, sg.servicegroup_id, ss.servicestatus_id, hs.hoststatus_id UNION ALL SELECT sg.alias COLLATE latin1_general_ci AS servicegroup_alias, sgo.name1 AS servicegroup_name, NULL AS service_handled, 0 AS service_severity, NULL
AS service_state FROM icinga_objects AS sgo INNER JOIN icinga_servicegroups AS sg ON sg.servicegroup_object_id = sgo.object_id AND sgo.objecttype_id = 4 AND sgo.is_active = 1 LEFT JOIN icinga_servicegroup_members AS esgm ON esgm.servicegroup_id = sg.servicegroup_id GROUP BY sgo.object_id, sg.servicegroup_id HAVING ( COUNT(esgm.servicegroup_member_id) = 0 );
^CCtrl-C -- query killed. Continuing normally.
ERROR 1317 (70100): Query execution was interrupted
MariaDB [icinga2]>
(query was still running after 60 seconds)
The solution
I added an index on icinga_servicegroups using the command:
CREATE INDEX idx_servicegroup_object_id ON icinga_servicegroups (servicegroup_object_id);
Describe the bug
The page
/monitoring/list/servicegroups
can be very slow, to the point of not responding, due to missing indexesTo Reproduce
Using:
I narrowed the slowness in the following sub query :
(prettyfied for readability)
An explain of the query before:
Expected behavior
The query should be quick.
Screenshots
(query was still running after 60 seconds)
The solution
I added an index on icinga_servicegroups using the command:
The
explain extended
of the query with the index:The query computes now in :
Your Environment
Icinga Web 2 version and modules (System - About): 2.9.3 (c757a17530c326c7d6daebd6171486cb8167a8a1)
Web browser used: Firefox 78.14 esr (debian testing)
Icinga 2 version used (
icinga2 --version
): r2.13.1-1 (debian packages)PHP version used (
php --version
): 7.4.21 (debian)Database: mariadb 10.5.11-MariaDB-1-log (debian)
Server operating system and version: debian 11
1 single master
7 satellites, with 3 in HA mode
1 webserver (not on the master)