Icinga / icingaweb2

A lightweight and extensible web interface to keep an eye on your environment. Analyse problems and act on them.
https://icinga.com/get-started/
GNU General Public License v2.0
806 stars 279 forks source link

Slow servicegroups page #4545

Open Yoda-BZH opened 3 years ago

Yoda-BZH commented 3 years ago

Describe the bug

The page /monitoring/list/servicegroups can be very slow, to the point of not responding, due to missing indexes

To Reproduce

Using:

I narrowed the slowness in the following sub query :

SELECT sg.alias COLLATE latin1_general_ci AS servicegroup_alias,
       sgo.name1                          AS servicegroup_name,
       CASE
         WHEN ( ss.problem_has_been_acknowledged
                + ss.scheduled_downtime_depth
                + COALESCE(hs.current_state, 0) ) > 0 THEN 1
         ELSE 0
       end                                AS service_handled,
       CASE
         WHEN ss.current_state = 0 THEN
         CASE WHEN ss.has_been_checked = 0 OR ss.has_been_checked IS
         NULL THEN
         16 ELSE
         0
         end + CASE WHEN ss.problem_has_been_acknowledged = 1 THEN 2
         ELSE CASE
         WHEN
         ss.scheduled_downtime_depth > 0
         THEN 1 ELSE 4 end end
         ELSE CASE WHEN ss.has_been_checked = 0 OR ss.has_been_checked
              IS NULL
              THEN 16
              WHEN
              ss.current_state = 1 THEN 32 WHEN ss.current_state = 2
              THEN 128
              WHEN
              ss.current_state =
              3 THEN 64 ELSE 256 end + CASE WHEN hs.current_state > 0
              THEN 1024
              ELSE
              CASE WHEN ss.problem_has_been_acknowledged = 1 THEN 512
              ELSE CASE
              WHEN
              ss.scheduled_downtime_depth > 0 THEN 256 ELSE 2048 end end
              end
       end                                AS service_severity,
       CASE
         WHEN ss.has_been_checked = 0
               OR ss.has_been_checked IS NULL THEN 99
         ELSE ss.current_state
       end                                AS service_state
FROM   icinga_objects AS sgo
       INNER JOIN icinga_servicegroups AS sg
               ON sg.servicegroup_object_id = sgo.object_id
                  AND sgo.objecttype_id = 4
                  AND sgo.is_active = 1
       INNER JOIN icinga_servicegroup_members AS sgm
               ON sgm.servicegroup_id = sg.servicegroup_id
       INNER JOIN icinga_objects AS so
               ON so.object_id = sgm.service_object_id
                  AND so.objecttype_id = 2
                  AND so.is_active = 1
       INNER JOIN icinga_services AS s
               ON s.service_object_id = so.object_id
       INNER JOIN icinga_hoststatus AS hs
               ON hs.host_object_id = s.host_object_id
       INNER JOIN icinga_servicestatus AS ss
               ON ss.service_object_id = so.object_id
GROUP  BY sgo.object_id,
          sg.servicegroup_id,
          ss.servicestatus_id,
          hs.hoststatus_id
UNION ALL
SELECT sg.alias COLLATE latin1_general_ci AS servicegroup_alias,
       sgo.name1                          AS servicegroup_name,
       NULL                               AS service_handled,
       0                                  AS service_severity,
       NULL                               AS service_state
FROM   icinga_objects AS sgo
       INNER JOIN icinga_servicegroups AS sg
               ON sg.servicegroup_object_id = sgo.object_id
                  AND sgo.objecttype_id = 4
                  AND sgo.is_active = 1
       LEFT JOIN icinga_servicegroup_members AS esgm
              ON esgm.servicegroup_id = sg.servicegroup_id
GROUP  BY sgo.object_id,
          sg.servicegroup_id
HAVING ( COUNT(esgm.servicegroup_member_id) = 0 )

(prettyfied for readability)

An explain of the query before:

+------+-------------+-------+--------+----------------------------------------------------------+------------------------------+---------+---------------------------------------------------------+------+----------+-------------------------------------------------+
| id   | select_type | table | type   | possible_keys                                            | key                          | key_len | ref                                                     | rows | filtered | Extra                                           |
+------+-------------+-------+--------+----------------------------------------------------------+------------------------------+---------+---------------------------------------------------------+------+----------+-------------------------------------------------+
|    1 | PRIMARY     | hs    | ALL    | object_id                                                | NULL                         | NULL    | NULL                                                    | 3319 |   100.00 | Using where; Using temporary; Using filesort    |
|    1 | PRIMARY     | sg    | ALL    | PRIMARY                                                  | NULL                         | NULL    | NULL                                                    | 32   |   100.00 | Using where; Using join buffer (flat, BNL join) |
|    1 | PRIMARY     | sgo   | eq_ref | PRIMARY,objecttype_id,sla_idx_obj                        | PRIMARY                      | 8       | icinga2.sg.servicegroup_object_id                       | 1    |     0.04 | Using where                                     |
|    1 | PRIMARY     | sgm   | ref    | sgmbrs_sgid_soid,idx_servicegroup_members_object_id      | sgmbrs_sgid_soid             | 9       | icinga2.sg.servicegroup_id                              | 374  |   100.00 | Using where; Using index                        |
|    1 | PRIMARY     | so    | eq_ref | PRIMARY,objecttype_id,sla_idx_obj                        | PRIMARY                      | 8       | icinga2.sgm.service_object_id                           | 1    |    50.00 | Using where                                     |
|    1 | PRIMARY     | ss    | eq_ref | object_id                                                | object_id                    | 9       | icinga2.sgm.service_object_id                           | 1    |   100.00 |                                                 |
|    1 | PRIMARY     | s     | ref    | services_host_object_id_idx,services_combined_object_idx | services_combined_object_idx | 18      | icinga2.sgm.service_object_id,icinga2.hs.host_object_id | 1    |   100.00 | Using index                                     |
|    2 | UNION       | sgo   | ref    | PRIMARY,objecttype_id,sla_idx_obj                        | sla_idx_obj                  | 12      | const,const                                             | 21   |   100.00 | Using index; Using temporary; Using filesort    |
|    2 | UNION       | sg    | ALL    | NULL                                                     | NULL                         | NULL    | NULL                                                    | 32   |   100.00 | Using where; Using join buffer (flat, BNL join) |
|    2 | UNION       | esgm  | ref    | sgmbrs_sgid_soid                                         | sgmbrs_sgid_soid             | 9       | icinga2.sg.servicegroup_id                              | 374  |   100.00 | Using index                                     |
+------+-------------+-------+--------+----------------------------------------------------------+------------------------------+---------+---------------------------------------------------------+------+----------+-------------------------------------------------+

Expected behavior

The query should be quick.

Screenshots

MariaDB [icinga2]> SELECT sg.alias COLLATE latin1_general_ci AS servicegroup_alias,                sgo.name1                          AS servicegroup_name,                CASE                  WHEN ( ss.problem_has_been_acknowledged                         + ss.scheduled_downtime_depth
        + COALESCE(hs.current_state, 0) ) > 0 THEN 1                  ELSE 0                end                                AS service_handled,                CASE                  WHEN ss.current_state = 0 THEN                  CASE WHEN ss.has_been_checked = 0 OR ss.has_been_checked IS
      NULL THEN                  16 ELSE                  0                  end + CASE WHEN ss.problem_has_been_acknowledged = 1 THEN 2                  ELSE CASE                  WHEN                  ss.scheduled_downtime_depth > 0                  THEN 1 ELSE 4 end end                  ELSE CASE WHEN ss.has_been_checked = 0 OR ss.has_been_checked                       IS NULL                       THEN 16                       WHEN                       ss.current_state = 1 THEN 32 WHEN ss.current_state = 2                       THEN 128                       WHEN                       ss.current_state =                       3 THEN 64 ELSE 256 end + CASE WHEN hs.current_state > 0                       THEN 1024                       ELSE                       CASE WHEN ss.problem_has_been_acknowledged = 1 THEN 512                       ELSE CASE                       WHEN
   ss.scheduled_downtime_depth > 0 THEN 256 ELSE 2048 end end                       end                end                                AS service_severity,                CASE                  WHEN ss.has_been_checked = 0                        OR ss.has_been_checked IS NULL THEN 99
 ELSE ss.current_state                end                                AS service_state         FROM   icinga_objects AS sgo                INNER JOIN icinga_servicegroups AS sg                        ON sg.servicegroup_object_id = sgo.object_id                           AND sgo.objecttype_id = 4
                       AND sgo.is_active = 1                INNER JOIN icinga_servicegroup_members AS sgm                        ON sgm.servicegroup_id = sg.servicegroup_id                INNER JOIN icinga_objects AS so                        ON so.object_id = sgm.service_object_id
      AND so.objecttype_id = 2                           AND so.is_active = 1                INNER JOIN icinga_services AS s                        ON s.service_object_id = so.object_id                INNER JOIN icinga_hoststatus AS hs                        ON hs.host_object_id = s.host_object_id
           INNER JOIN icinga_servicestatus AS ss                        ON ss.service_object_id = so.object_id         GROUP  BY sgo.object_id,                   sg.servicegroup_id,                   ss.servicestatus_id,                   hs.hoststatus_id         UNION ALL         SELECT sg.alias COLLATE latin1_general_ci AS servicegroup_alias,                sgo.name1                          AS servicegroup_name,                NULL                               AS service_handled,                0                                  AS service_severity,                NULL
    AS service_state         FROM   icinga_objects AS sgo                INNER JOIN icinga_servicegroups AS sg                        ON sg.servicegroup_object_id = sgo.object_id                           AND sgo.objecttype_id = 4                           AND sgo.is_active = 1                LEFT JOIN icinga_servicegroup_members AS esgm                       ON esgm.servicegroup_id = sg.servicegroup_id         GROUP  BY sgo.object_id,                   sg.servicegroup_id         HAVING ( COUNT(esgm.servicegroup_member_id) = 0 );

^CCtrl-C -- query killed. Continuing normally.
ERROR 1317 (70100): Query execution was interrupted
MariaDB [icinga2]> 

(query was still running after 60 seconds)

The solution

I added an index on icinga_servicegroups using the command:

CREATE INDEX idx_servicegroup_object_id ON icinga_servicegroups (servicegroup_object_id);

The explain extended of the query with the index:

+------+-------------+-------+--------+----------------------------------------------------------+------------------------------+---------+-------------------------------+------+----------+----------------------------------------------+
| id   | select_type | table | type   | possible_keys                                            | key                          | key_len | ref                           | rows | filtered | Extra                                        |
+------+-------------+-------+--------+----------------------------------------------------------+------------------------------+---------+-------------------------------+------+----------+----------------------------------------------+
|    1 | PRIMARY     | sgo   | ref    | PRIMARY,objecttype_id,sla_idx_obj                        | sla_idx_obj                  | 12      | const,const                   | 21   |   100.00 | Using index; Using temporary; Using filesort |
|    1 | PRIMARY     | sg    | ref    | PRIMARY,idx_servicegroup_object_id                       | idx_servicegroup_object_id   | 9       | icinga2.sgo.object_id         | 1    |   100.00 |                                              |
|    1 | PRIMARY     | sgm   | ref    | sgmbrs_sgid_soid,idx_servicegroup_members_object_id      | sgmbrs_sgid_soid             | 9       | icinga2.sg.servicegroup_id    | 374  |   100.00 | Using where; Using index                     |
|    1 | PRIMARY     | so    | eq_ref | PRIMARY,objecttype_id,sla_idx_obj                        | PRIMARY                      | 8       | icinga2.sgm.service_object_id | 1    |    50.00 | Using where                                  |
|    1 | PRIMARY     | s     | ref    | services_host_object_id_idx,services_combined_object_idx | services_combined_object_idx | 9       | icinga2.sgm.service_object_id | 1    |   100.00 | Using where; Using index                     |
|    1 | PRIMARY     | hs    | eq_ref | object_id                                                | object_id                    | 9       | icinga2.s.host_object_id      | 1    |   100.00 |                                              |
|    1 | PRIMARY     | ss    | eq_ref | object_id                                                | object_id                    | 9       | icinga2.sgm.service_object_id | 1    |   100.00 |                                              |
|    2 | UNION       | sgo   | ref    | PRIMARY,objecttype_id,sla_idx_obj                        | sla_idx_obj                  | 12      | const,const                   | 21   |   100.00 | Using index; Using temporary; Using filesort |
|    2 | UNION       | sg    | ref    | idx_servicegroup_object_id                               | idx_servicegroup_object_id   | 9       | icinga2.sgo.object_id         | 1    |   100.00 |                                              |
|    2 | UNION       | esgm  | ref    | sgmbrs_sgid_soid                                         | sgmbrs_sgid_soid             | 9       | icinga2.sg.servicegroup_id    | 374  |   100.00 | Using index                                  |
+------+-------------+-------+--------+----------------------------------------------------------+------------------------------+---------+-------------------------------+------+----------+----------------------------------------------+

The query computes now in :

12196 rows in set (0.125 sec)

Your Environment

Yoda-BZH commented 2 years ago

linked to https://github.com/Icinga/icinga2/issues/9017