eclipse / hawkbit

Eclipse hawkBit™
https://projects.eclipse.org/projects/iot.hawkbit
Eclipse Public License 2.0
453 stars 187 forks source link

Expensive / slow query with dynamic rollouts on Postgres #1761

Closed wem closed 1 month ago

wem commented 1 month ago

We at Gardena have started evaluating Hawkbit. In our case, we currently have around 300k+ targets. The new feature, "dynamic rollouts," sounds interesting, so we tried it.

After defining some dynamic rollouts, we noted that Hawkbit starts to execute a scheduled (periodic), expensive, and slow query (30min max. duration):

SELECT DISTINCT t1.id                            AS a1,
                t1.tenant                        AS a2,
                t1.address                       AS a3,
                t1.controller_id                 AS a4,
                t1.created_at                    AS a5,
                t1.created_by                    AS a6,
                t1.description                   AS a7,
                t1.install_date                  AS a8,
                t1.last_modified_at              AS a9,
                t1.last_modified_by              AS a10,
                t1.last_target_query             AS a11,
                t1.name                          AS a12,
                t1.optlock_revision              AS a13,
                t1.request_controller_attributes AS a14,
                t1.sec_token                     AS a15,
                t1.update_status                 AS a16,
                t1.assigned_distribution_set     AS a17,
                t1.installed_distribution_set    AS a18,
                t1.target_type                   AS a19
FROM sp_target t1
         LEFT OUTER JOIN sp_target_attributes t4 ON (t4.target_id = t1.id)
         LEFT OUTER JOIN sp_rollouttargetgroup t5 ON ((t5.target_id = t1.id) AND (t5.rolloutGroup_Id >= $1))
         LEFT OUTER JOIN sp_action t3
                         ON ((t3.target = t1.id) AND ((t3.weight > $2) OR ((t3.weight = $3) AND (t3.rollout >= $4))))
         LEFT OUTER JOIN sp_target_type t0 ON (t0.id = t1.target_type)
         LEFT OUTER JOIN (sp_target_type_ds_type_relation t7 JOIN sp_distribution_set_type t2
                          ON (t2.id = t7.distribution_set_type)) ON (t7.target_type = t0.id)
WHERE (((((UPPER(t1.controller_id) LIKE $5 ESCAPE
           $6 AND ((UPPER(t4.attribute_key) = $7) AND UPPER(t4.attribute_value) LIKE $8 ESCAPE $9)) AND
          (t5.target_id IS NULL)) AND (t3.id IS NULL)) AND ((t1.target_type IS NULL) OR (t2.id = $10))) AND
       (t1.tenant = $11))
LIMIT $12 OFFSET $13

The query plan explanation looks like this:

Limit  (cost=3906.61..3906.66 rows=1 width=242)
  ->  Unique  (cost=3906.61..3906.66 rows=1 width=242)
        ->  Sort  (cost=3906.61..3906.61 rows=1 width=242)
"              Sort Key: t1.id, t1.address, t1.controller_id, t1.created_at, t1.created_by, t1.description, t1.install_date, t1.last_modified_at, t1.last_modified_by, t1.last_target_query, t1.name, t1.optlock_revision, t1.request_controller_attributes, t1.sec_token, t1.update_status, t1.assigned_distribution_set, t1.installed_distribution_set, t1.target_type"
              ->  Nested Loop Anti Join  (cost=2.33..3906.60 rows=1 width=242)
                    ->  Nested Loop Left Join  (cost=1.91..430.64 rows=1 width=242)
                          Join Filter: (t7.target_type = t0.id)
                          Filter: ((t1.target_type IS NULL) OR (t2.id = '3'::bigint))
                          ->  Nested Loop Left Join  (cost=0.84..368.36 rows=1 width=250)
                                Join Filter: (t0.id = t1.target_type)
                                ->  Nested Loop Left Join  (cost=0.84..357.24 rows=1 width=242)
                                      Filter: (t3.id IS NULL)
                                      ->  Nested Loop  (cost=0.42..9.93 rows=1 width=242)
                                            ->  Seq Scan on sp_target_attributes t4  (cost=0.00..1.48 rows=1 width=8)
                                                  Filter: ((upper((attribute_value)::text) ~~ 'SMART-GATEWAY-MT7688-HAWKBIT'::text) AND (upper((attribute_key)::text) = 'IMAGEID'::text))
                                            ->  Index Scan using sp_idx_target_prim_sp_target on sp_target t1  (cost=0.42..8.45 rows=1 width=242)
                                                  Index Cond: (((tenant)::text = 'DEFAULT'::text) AND (id = t4.target_id))
                                                  Filter: (upper((controller_id)::text) ~~ '%'::text)
                                      ->  Index Scan using sp_idx_action_02_sp_action on sp_action t3  (cost=0.41..347.30 rows=1 width=16)
                                            Index Cond: (target = t1.id)
                                            Filter: ((weight > 1000) OR ((weight = 1000) AND (rollout >= '23'::bigint)))
                                ->  Seq Scan on sp_target_type t0  (cost=0.00..10.50 rows=50 width=8)
                          ->  Hash Join  (cost=1.07..34.52 rows=1850 width=16)
                                Hash Cond: (t7.distribution_set_type = t2.id)
                                ->  Seq Scan on sp_target_type_ds_type_relation t7  (cost=0.00..28.50 rows=1850 width=16)
                                ->  Hash  (cost=1.03..1.03 rows=3 width=8)
                                      ->  Seq Scan on sp_distribution_set_type t2  (cost=0.00..1.03 rows=3 width=8)
                    ->  Index Only Scan using pk_sp_rollouttargetgroup on sp_rollouttargetgroup t5  (cost=0.42..3480.05 rows=1 width=8)
                          Index Cond: ((rolloutgroup_id >= '64'::bigint) AND (target_id = t1.id))

After a short investigation, we'd propose to add the following indexes to defuse the situation:

sp_rollouttargetgroup.target_id
sp_target_attributes.target_id
sp_action.target

After adding those indexes, the query plan looks much more efficient:

Limit  (cost=50.51..50.56 rows=1 width=242)
  ->  Unique  (cost=50.51..50.56 rows=1 width=242)
        ->  Sort  (cost=50.51..50.51 rows=1 width=242)
"              Sort Key: t1.id, t1.address, t1.controller_id, t1.created_at, t1.created_by, t1.description, t1.install_date, t1.last_modified_at, t1.last_modified_by, t1.last_target_query, t1.name, t1.optlock_revision, t1.request_controller_attributes, t1.sec_token, t1.update_status, t1.assigned_distribution_set, t1.installed_distribution_set, t1.target_type"
              ->  Nested Loop Anti Join  (cost=2.37..50.50 rows=1 width=242)
                    ->  Nested Loop Left Join  (cost=1.95..50.02 rows=1 width=242)
                          Filter: ((t1.target_type IS NULL) OR (t2.id = '3'::bigint))
                          ->  Nested Loop Left Join  (cost=0.71..10.26 rows=1 width=242)
                                Filter: (t3.id IS NULL)
                                ->  Nested Loop  (cost=0.42..9.93 rows=1 width=242)
                                      ->  Seq Scan on sp_target_attributes t4  (cost=0.00..1.48 rows=1 width=8)
                                            Filter: ((upper((attribute_value)::text) ~~ 'SMART-GATEWAY-MT7688-HAWKBIT'::text) AND (upper((attribute_key)::text) = 'IMAGEID'::text))
                                      ->  Index Scan using sp_idx_target_prim_sp_target on sp_target t1  (cost=0.42..8.45 rows=1 width=242)
                                            Index Cond: (((tenant)::text = 'DEFAULT'::text) AND (id = t4.target_id))
                                            Filter: (upper((controller_id)::text) ~~ '%'::text)
                                ->  Index Scan using idx_sp_action_03 on sp_action t3  (cost=0.29..0.32 rows=1 width=16)
                                      Index Cond: (target = t1.id)
                                      Filter: ((weight > 1000) OR ((weight = 1000) AND (rollout >= '23'::bigint)))
                          ->  Hash Right Join  (cost=1.24..39.65 rows=9 width=16)
                                Hash Cond: (t7.target_type = t0.id)
                                ->  Hash Join  (cost=1.07..34.52 rows=1850 width=16)
                                      Hash Cond: (t7.distribution_set_type = t2.id)
                                      ->  Seq Scan on sp_target_type_ds_type_relation t7  (cost=0.00..28.50 rows=1850 width=16)
                                      ->  Hash  (cost=1.03..1.03 rows=3 width=8)
                                            ->  Seq Scan on sp_distribution_set_type t2  (cost=0.00..1.03 rows=3 width=8)
                                ->  Hash  (cost=0.16..0.16 rows=1 width=8)
                                      ->  Index Only Scan using pk_sp_target_type on sp_target_type t0  (cost=0.14..0.16 rows=1 width=8)
                                            Index Cond: (id = t1.target_type)
                    ->  Index Scan using idx_sp_rollouttargetgroup_01 on sp_rollouttargetgroup t5  (cost=0.42..0.46 rows=1 width=8)
                          Index Cond: (target_id = t1.id)
                          Filter: (rolloutgroup_id >= '64'::bigint)

We mainly tested it with Postgres 15.4. Postgres 16.2 brought some query planner improvements, which positively affects that query (without additional indexes), but the end result looks similar.

Please note that Postgres doesn't automatically create an index on foreign key constraints!

Kind regards

avgustinmm commented 1 month ago

Hi @wem, we appreciate your interest in hawkBit, your investigation, and your proposal. At this very moment, we are attempting to optimize the Rollout execution - we have similar problems with MySQL. So, we would evaluate carefully your proposal.

avgustinmm commented 1 month ago

MySQL creates such indexes implicitly and this seems reasonable. So, it seems like something that should be added explicitly for Postgres. You could make a PR with these indexes or we could do that if you like.

avgustinmm commented 1 month ago

@wem, we have prepared PR with the indexes - would you like to check if they improve performance in your use case as expected?

avgustinmm commented 1 month ago

I've merged the PR with the proposed indexes - seems fine.