Current queries that are slow on SkyPortal have access policies that are instances of AccessibleIfRelatedRowsAreAccessible(public_relationship=access_mode), that is to say, anything that requires the ability to access a related record that itself has a public access policy.
The reason this is slow is because joining against publicly accessible records renders the entire ID column of the associated table as a subquery, which is slow. In reality, this join doesn't need to be performed, as it does not filter the output of the calling join in any way.
Example:
This query for accessible annotations is currently slow. The associated policy is:
fritz-1=> explain analyze SELECT
annotations.id AS annotations_id,
annotations.created_at AS annotations_created_at,
annotations.modified AS annotations_modified,
annotations.DATA AS annotations_data,
annotations.author_id AS annotations_author_id,
annotations.origin AS annotations_origin,
annotations.obj_id AS annotations_obj_id
FROM
annotations
JOIN (
SELECT
annotations_1.id AS id
FROM
annotations AS annotations_1
JOIN
group_annotations AS group_annotations_1
ON
annotations_1.id = group_annotations_1.annotation_id
JOIN
GROUPS
ON
GROUPS.id = group_annotations_1.group_id
JOIN
group_users AS group_users_1
ON
GROUPS.id = group_users_1.group_id
JOIN
users
ON
users.id = group_users_1.user_id
WHERE users.id = 3) as anon_1 ON
anon_1.id = annotations.id
JOIN (
SELECT
annotations_2.id AS id
FROM
annotations AS annotations_2
JOIN
objs
ON
objs.id = annotations_2.obj_id
JOIN (
SELECT
objs.id AS id
FROM
objs) AS anon_3
ON
anon_3.id = objs.id) AS anon_2
ON
annotations.obj_id ='ZTF21aastzsn';
This query takes about 1s to execute:
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Gather (cost=33085.52..75745.23 rows=1 width=267) (actual time=793.877..840.945 rows=66 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Hash Join (cost=32085.52..74745.13 rows=1 width=267) (actual time=756.329..788.281 rows=22 loops=3)
Hash Cond: (annotations_2.id = annotations_1.id)
-> Parallel Hash Join (cost=32046.31..73648.49 rows=281978 width=4) (actual time=627.583..756.645 rows=224809 loops=3)
Hash Cond: ((objs.id)::text = (objs_1.id)::text)
-> Parallel Hash Join (cost=16023.15..52002.14 rows=281978 width=30) (actual time=287.578..433.801 rows=224809 loops=3)
Hash Cond: ((annotations_2.obj_id)::text = (objs.id)::text)
-> Parallel Seq Scan on annotations annotations_2 (cost=0.00..30905.78 rows=281978 width=17) (actual time=0.016..110.367 rows=224809 loops=3)
-> Parallel Hash (cost=12367.18..12367.18 rows=210318 width=13) (actual time=99.105..99.106 rows=168578 loops=3)
Buckets: 131072 Batches: 8 Memory Usage: 4032kB
-> Parallel Seq Scan on objs (cost=0.00..12367.18 rows=210318 width=13) (actual time=0.019..48.660 rows=168578 loops=3)
-> Parallel Hash (cost=12367.18..12367.18 rows=210318 width=13) (actual time=108.617..108.618 rows=168578 loops=3)
Buckets: 131072 Batches: 8 Memory Usage: 4032kB
-> Parallel Seq Scan on objs objs_1 (cost=0.00..12367.18 rows=210318 width=13) (actual time=0.023..57.424 rows=168578 loops=3)
-> Hash (cost=39.20..39.20 rows=1 width=275) (actual time=1.555..1.562 rows=66 loops=3)
Buckets: 1024 Batches: 1 Memory Usage: 20kB
-> Nested Loop (cost=1.97..39.20 rows=1 width=275) (actual time=0.487..1.509 rows=66 loops=3)
-> Nested Loop (cost=1.82..31.03 rows=1 width=279) (actual time=0.418..1.308 rows=66 loops=3)
-> Nested Loop (cost=1.55..30.74 rows=1 width=287) (actual time=0.362..1.088 rows=66 loops=3)
-> Nested Loop (cost=1.27..30.14 rows=2 width=279) (actual time=0.217..0.745 rows=119 loops=3)
-> Nested Loop (cost=0.85..29.09 rows=2 width=271) (actual time=0.128..0.348 rows=11 loops=3)
-> Index Scan using ix_annotations_obj_id on annotations (cost=0.42..12.21 rows=2 width=267) (actual time=0.067..0.169 rows=11 loops=3)
Index Cond: ((obj_id)::text = 'ZTF21aastzsn'::text)
-> Index Only Scan using annotations_pkey on annotations annotations_1 (cost=0.42..8.44 rows=1 width=4) (actual time=0.015..0.015 rows=1 loops=33)
Index Cond: (id = annotations.id)
Heap Fetches: 36
-> Index Only Scan using group_annotations_reverse_ind on group_annotations group_annotations_1 (cost=0.42..0.50 rows=2 width=8) (actual time=0.024..0.032 rows=11 loops=33)
Index Cond: (annotation_id = annotations_1.id)
Heap Fetches: 357
-> Index Only Scan using group_users_forward_ind on group_users group_users_1 (cost=0.28..0.30 rows=1 width=8) (actual time=0.002..0.002 rows=1 loops=357)
Index Cond: ((group_id = group_annotations_1.group_id) AND (user_id = 3))
Heap Fetches: 198
-> Index Only Scan using groups_pkey on groups (cost=0.27..0.29 rows=1 width=4) (actual time=0.003..0.003 rows=1 loops=198)
Index Cond: (id = group_annotations_1.group_id)
Heap Fetches: 198
-> Index Only Scan using users_pkey on users (cost=0.14..8.16 rows=1 width=4) (actual time=0.002..0.003 rows=1 loops=198)
Index Cond: (id = 3)
Heap Fetches: 198
Planning Time: 2.966 ms
Execution Time: 841.120 ms
(42 rows)
The reason this query is slow is because of the following part of the query:
JOIN (
SELECT
objs.id AS id
FROM
objs) AS anon_3
ON
anon_3.id = objs.id)
This is the part of the query that enforces AccessibleIfRelatedRowsAreAccessible(obj='read'). It has no effect since Objs are already ready public.
If we remove this, the query is 800x faster (whole thing takes 1ms):
fritz-1=> explain analyze SELECT
annotations.id AS annotations_id,
annotations.created_at AS annotations_created_at,
annotations.modified AS annotations_modified,
annotations.DATA AS annotations_data,
annotations.author_id AS annotations_author_id,
annotations.origin AS annotations_origin,
annotations.obj_id AS annotations_obj_id
FROM
annotations
JOIN (
SELECT
annotations_1.id AS id
FROM
annotations AS annotations_1
JOIN
group_annotations AS group_annotations_1
ON
annotations_1.id = group_annotations_1.annotation_id
JOIN
GROUPS
ON
GROUPS.id = group_annotations_1.group_id
JOIN
group_users AS group_users_1
ON
GROUPS.id = group_users_1.group_id
JOIN
users
ON
users.id = group_users_1.user_id
WHERE users.id = 3) as anon_1 ON
anon_1.id = annotations.id
JOIN (
SELECT
annotations_2.id AS id
FROM
annotations AS annotations_2
JOIN
objs
ON
objs.id = annotations_2.obj_id) AS anon_2
ON
anon_2.id = annotations.id
WHERE
annotations.obj_id ='ZTF21aastzsn';
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=2.81..40.26 rows=1 width=267) (actual time=0.179..1.453 rows=66 loops=1)
-> Nested Loop (cost=2.39..39.81 rows=1 width=280) (actual time=0.156..0.846 rows=66 loops=1)
-> Nested Loop (cost=2.25..31.63 rows=1 width=284) (actual time=0.145..0.744 rows=66 loops=1)
-> Nested Loop (cost=1.98..31.35 rows=1 width=292) (actual time=0.134..0.621 rows=66 loops=1)
-> Nested Loop (cost=1.55..30.75 rows=1 width=300) (actual time=0.129..0.489 rows=66 loops=1)
-> Nested Loop (cost=1.27..30.15 rows=2 width=292) (actual time=0.066..0.279 rows=119 loops=1)
Join Filter: (annotations.id = group_annotations_1.annotation_id)
-> Nested Loop (cost=0.85..29.09 rows=2 width=284) (actual time=0.047..0.116 rows=11 loops=1)
-> Index Scan using ix_annotations_obj_id on annotations (cost=0.42..12.21 rows=2 width=267) (actual time=0.033..0.061 rows=11 loops=1)
Index Cond: ((obj_id)::text = 'ZTF21aastzsn'::text)
-> Index Scan using annotations_pkey on annotations annotations_2 (cost=0.42..8.44 rows=1 width=17) (actual time=0.004..0.004 rows=1 loops=11)
Index Cond: (id = annotations.id)
-> Index Only Scan using group_annotations_reverse_ind on group_annotations group_annotations_1 (cost=0.42..0.50 rows=2 width=8) (actual time=0.007..0.011 rows=11 loops=11)
Index Cond: (annotation_id = annotations_2.id)
Heap Fetches: 119
-> Index Only Scan using group_users_forward_ind on group_users group_users_1 (cost=0.28..0.30 rows=1 width=8) (actual time=0.001..0.001 rows=1 loops=119)
Index Cond: ((group_id = group_annotations_1.group_id) AND (user_id = 3))
Heap Fetches: 66
-> Index Only Scan using annotations_pkey on annotations annotations_1 (cost=0.42..0.60 rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=66)
Index Cond: (id = group_annotations_1.annotation_id)
Heap Fetches: 66
-> Index Only Scan using groups_pkey on groups (cost=0.27..0.29 rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=66)
Index Cond: (id = group_annotations_1.group_id)
Heap Fetches: 66
-> Index Only Scan using users_pkey on users (cost=0.14..8.16 rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=66)
Index Cond: (id = 3)
Heap Fetches: 66
-> Index Only Scan using objs_pkey on objs (cost=0.42..0.46 rows=1 width=13) (actual time=0.009..0.009 rows=1 loops=66)
Index Cond: (id = (annotations_2.obj_id)::text)
Heap Fetches: 66
Planning Time: 6.589 ms
Execution Time: 1.582 ms
(32 rows)
Similar for the next longest query:
SELECT
GROUPS.id AS groups_id,
GROUPS.created_at AS groups_created_at,
GROUPS.modified AS groups_modified,
GROUPS.name AS groups_name,
GROUPS.nickname AS groups_nickname,
GROUPS.private AS groups_private,
GROUPS.single_user_group AS groups_single_user_group
FROM
GROUPS
JOIN (
SELECT
sources.created_at AS created_at,
sources.modified AS modified,
sources.id AS id,
sources.group_id AS group_id,
sources.obj_id AS obj_id,
sources.saved_by_id AS saved_by_id,
sources.saved_at AS saved_at,
sources.active AS active,
sources.requested AS requested,
sources.unsaved_by_id AS unsaved_by_id,
sources.unsaved_at AS unsaved_at
FROM
sources
JOIN (
SELECT
sources_1.id AS id
FROM
sources AS sources_1
JOIN
GROUPS
ON
GROUPS.id = sources_1.group_id
JOIN
group_users AS group_users_1
ON
GROUPS.id = group_users_1.group_id
JOIN
users
ON
users.id = group_users_1.user_id
WHERE
users.id = 3) AS anon_2
ON
anon_2.id = sources.id
JOIN (
SELECT
sources_2.id AS id
FROM
sources AS sources_2
JOIN
GROUPS
ON
GROUPS.id = sources_2.group_id
JOIN (
SELECT
GROUPS.id AS id
FROM
GROUPS) AS anon_4
ON
anon_4.id = GROUPS.id
JOIN
objs
ON
objs.id = sources_2.obj_id
JOIN (
SELECT
objs.id AS id
FROM
objs) AS anon_5
ON
anon_5.id = objs.id) AS anon_3
ON
anon_3.id = sources.id
WHERE
sources.obj_id = 'ZTF21aastzsn'
AND sources.active IS TRUE) AS anon_1
ON
GROUPS.id = anon_1.group_id
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=22678.34..30571.60 rows=1 width=41) (actual time=385.595..628.802 rows=3 loops=1)
-> Gather (cost=22678.07..30563.25 rows=1 width=4) (actual time=385.576..628.746 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Hash Join (cost=21678.07..29563.15 rows=1 width=4) (actual time=535.870..616.328 rows=1 loops=3)
Hash Cond: (sources_2.id = sources_1.id)
-> Merge Join (cost=21652.55..29461.96 rows=20175 width=4) (actual time=362.537..613.134 rows=15015 loops=3)
Merge Cond: (sources_2.group_id = groups_2.id)
-> Nested Loop (cost=21632.72..31243.00 rows=20175 width=12) (actual time=362.373..607.957 rows=15015 loops=3)
-> Merge Join (cost=21632.30..21914.67 rows=20175 width=38) (actual time=362.330..373.630 rows=15015 loops=3)
Merge Cond: (sources_2.group_id = groups_3.id)
-> Sort (cost=21612.47..21662.90 rows=20175 width=34) (actual time=362.097..365.817 rows=15015 loops=3)
Sort Key: sources_2.group_id
Sort Method: quicksort Memory: 1373kB
Worker 0: Sort Method: quicksort Memory: 2361kB
Worker 1: Sort Method: quicksort Memory: 1323kB
-> Merge Join (cost=0.84..20169.92 rows=20175 width=34) (actual time=1.065..353.887 rows=15015 loops=3)
Merge Cond: ((objs.id)::text = (sources_2.obj_id)::text)
-> Parallel Index Only Scan using objs_pkey on objs (cost=0.42..15916.49 rows=210318 width=13) (actual time=0.052..132.598 rows=168577 loops=3)
Heap Fetches: 327764
-> Index Scan using sources_reverse_ind on sources sources_2 (cost=0.41..4760.97 rows=48420 width=21) (actual time=0.012..40.221 rows=45032 loops=3)
-> Sort (cost=19.83..20.43 rows=239 width=4) (actual time=0.226..0.272 rows=242 loops=3)
Sort Key: groups_3.id
Sort Method: quicksort Memory: 37kB
Worker 0: Sort Method: quicksort Memory: 37kB
Worker 1: Sort Method: quicksort Memory: 37kB
-> Seq Scan on groups groups_3 (cost=0.00..10.39 rows=239 width=4) (actual time=0.043..0.129 rows=256 loops=3)
-> Index Only Scan using objs_pkey on objs objs_1 (cost=0.42..0.46 rows=1 width=13) (actual time=0.015..0.015 rows=1 loops=45046)
Index Cond: (id = (objs.id)::text)
Heap Fetches: 51053
-> Sort (cost=19.83..20.43 rows=239 width=4) (actual time=0.160..0.205 rows=242 loops=3)
Sort Key: groups_2.id
Sort Method: quicksort Memory: 37kB
Worker 0: Sort Method: quicksort Memory: 37kB
Worker 1: Sort Method: quicksort Memory: 37kB
-> Seq Scan on groups groups_2 (cost=0.00..10.39 rows=239 width=4) (actual time=0.005..0.066 rows=256 loops=3)
-> Hash (cost=25.50..25.50 rows=1 width=12) (actual time=0.395..0.401 rows=3 loops=3)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Nested Loop (cost=1.40..25.50 rows=1 width=12) (actual time=0.247..0.389 rows=3 loops=3)
-> Nested Loop (cost=1.25..17.33 rows=1 width=16) (actual time=0.189..0.326 rows=3 loops=3)
-> Nested Loop (cost=0.98..17.04 rows=1 width=24) (actual time=0.133..0.263 rows=3 loops=3)
-> Nested Loop (cost=0.70..16.74 rows=1 width=16) (actual time=0.093..0.174 rows=7 loops=3)
-> Index Scan using sources_reverse_ind on sources (cost=0.41..8.43 rows=1 width=8) (actual time=0.061..0.100 rows=7 loops=3)
Index Cond: ((obj_id)::text = 'ZTF21aastzsn'::text)
Filter: (active IS TRUE)
-> Index Scan using sources_pkey on sources sources_1 (cost=0.29..8.31 rows=1 width=8) (actual time=0.009..0.009 rows=1 loops=21)
Index Cond: (id = sources.id)
-> Index Only Scan using group_users_forward_ind on group_users group_users_1 (cost=0.28..0.30 rows=1 width=8) (actual time=0.012..0.012 rows=0 loops=21)
Index Cond: ((group_id = sources_1.group_id) AND (user_id = 3))
Heap Fetches: 9
-> Index Only Scan using groups_pkey on groups groups_1 (cost=0.27..0.29 rows=1 width=4) (actual time=0.019..0.019 rows=1 loops=9)
Index Cond: (id = sources_1.group_id)
Heap Fetches: 9
-> Index Only Scan using users_pkey on users (cost=0.14..8.16 rows=1 width=4) (actual time=0.019..0.020 rows=1 loops=9)
Index Cond: (id = 3)
Heap Fetches: 9
-> Index Scan using groups_pkey on groups (cost=0.27..8.29 rows=1 width=41) (actual time=0.011..0.012 rows=1 loops=3)
Index Cond: (id = sources.group_id)
Planning Time: 2.938 ms
Execution Time: 628.949 ms
(60 rows)
This one is slow because of this public query here:
JOIN (
SELECT
objs.id AS id
FROM
objs) AS anon_5
ON
anon_5.id = objs.id)
If we remove that part (which does nothing), the query speeds up by ~1000x:
SELECT
GROUPS.id AS groups_id,
GROUPS.created_at AS groups_created_at,
GROUPS.modified AS groups_modified,
GROUPS.name AS groups_name,
GROUPS.nickname AS groups_nickname,
GROUPS.private AS groups_private,
GROUPS.single_user_group AS groups_single_user_group
FROM
GROUPS
JOIN (
SELECT
sources.created_at AS created_at,
sources.modified AS modified,
sources.id AS id,
sources.group_id AS group_id,
sources.obj_id AS obj_id,
sources.saved_by_id AS saved_by_id,
sources.saved_at AS saved_at,
sources.active AS active,
sources.requested AS requested,
sources.unsaved_by_id AS unsaved_by_id,
sources.unsaved_at AS unsaved_at
FROM
sources
JOIN (
SELECT
sources_1.id AS id
FROM
sources AS sources_1
JOIN
GROUPS
ON
GROUPS.id = sources_1.group_id
JOIN
group_users AS group_users_1
ON
GROUPS.id = group_users_1.group_id
JOIN
users
ON
users.id = group_users_1.user_id
WHERE
users.id = 3) AS anon_2
ON
anon_2.id = sources.id
JOIN (
SELECT
sources_2.id AS id
FROM
sources AS sources_2
JOIN
GROUPS
ON
GROUPS.id = sources_2.group_id
JOIN
objs
ON
objs.id = sources_2.obj_id) AS anon_3
ON
anon_3.id = sources.id
WHERE
sources.obj_id = 'ZTF21aastzsn'
AND sources.active IS TRUE) AS anon_1
ON
GROUPS.id = anon_1.group_id
** QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=2.65..35.20 rows=1 width=41) (actual time=0.090..0.180 rows=3 loops=1)
-> Nested Loop (cost=2.38..26.85 rows=1 width=4) (actual time=0.087..0.173 rows=3 loops=1)
-> Nested Loop (cost=1.96..26.17 rows=1 width=17) (actual time=0.066..0.135 rows=3 loops=1)
-> Nested Loop (cost=1.69..25.88 rows=1 width=21) (actual time=0.063..0.128 rows=3 loops=1)
-> Nested Loop (cost=1.40..25.50 rows=1 width=12) (actual time=0.059..0.120 rows=3 loops=1)
-> Nested Loop (cost=1.25..17.33 rows=1 width=16) (actual time=0.047..0.103 rows=3 loops=1)
-> Nested Loop (cost=0.98..17.04 rows=1 width=24) (actual time=0.042..0.094 rows=3 loops=1)
-> Nested Loop (cost=0.70..16.74 rows=1 width=16) (actual time=0.035..0.062 rows=7 loops=1)
-> Index Scan using sources_reverse_ind on sources (cost=0.41..8.43 rows=1 width=8) (actual time=0.025..0.035 rows=7 loops=1)
Index Cond: ((obj_id)::text = 'ZTF21aastzsn'::text)
Filter: (active IS TRUE)
-> Index Scan using sources_pkey on sources sources_1 (cost=0.29..8.31 rows=1 width=8) (actual time=0.003..0.003 rows=1 loops=7)
Index Cond: (id = sources.id)
-> Index Only Scan using group_users_forward_ind on group_users group_users_1 (cost=0.28..0.30 rows=1 width=8) (actual time=0.003..0.003 rows=0 loops=7)
Index Cond: ((group_id = sources_1.group_id) AND (user_id = 3))
Heap Fetches: 3
-> Index Only Scan using groups_pkey on groups groups_1 (cost=0.27..0.29 rows=1 width=4) (actual time=0.003..0.003 rows=1 loops=3)
Index Cond: (id = sources_1.group_id)
Heap Fetches: 3
-> Index Only Scan using users_pkey on users (cost=0.14..8.16 rows=1 width=4) (actual time=0.004..0.005 rows=1 loops=3)
Index Cond: (id = 3)
Heap Fetches: 3
-> Index Scan using sources_pkey on sources sources_2 (cost=0.29..0.38 rows=1 width=21) (actual time=0.002..0.002 rows=1 loops=3)
Index Cond: (id = sources_1.id)
-> Index Only Scan using groups_pkey on groups groups_2 (cost=0.27..0.29 rows=1 width=4) (actual time=0.002..0.002 rows=1 loops=3)
Index Cond: (id = sources_2.group_id)
Heap Fetches: 3
-> Index Only Scan using objs_pkey on objs (cost=0.42..0.67 rows=1 width=13) (actual time=0.012..0.012 rows=1 loops=3)
Index Cond: (id = (sources_2.obj_id)::text)
Heap Fetches: 3
-> Index Scan using groups_pkey on groups (cost=0.27..8.29 rows=1 width=41) (actual time=0.001..0.001 rows=1 loops=3)
Index Cond: (id = sources.group_id)
Planning Time: 3.213 ms
Execution Time: 0.257 ms
(34 rows)
This PR removes these useless joins that slow down queries automatically from all future AccessibleIfRelatedRowsAreAccessible logic.
Current queries that are slow on SkyPortal have access policies that are instances of
AccessibleIfRelatedRowsAreAccessible(public_relationship=access_mode)
, that is to say, anything that requires the ability to access a related record that itself has a public access policy.The reason this is slow is because joining against publicly accessible records renders the entire ID column of the associated table as a subquery, which is slow. In reality, this join doesn't need to be performed, as it does not filter the output of the calling join in any way.
Example:
This query for accessible annotations is currently slow. The associated policy is:
This is the resulting query:
This query takes about 1s to execute:
The reason this query is slow is because of the following part of the query:
This is the part of the query that enforces
AccessibleIfRelatedRowsAreAccessible(obj='read')
. It has no effect since Objs are already ready public.If we remove this, the query is 800x faster (whole thing takes 1ms):
Similar for the next longest query:
This one is slow because of this public query here:
If we remove that part (which does nothing), the query speeds up by ~1000x:
This PR removes these useless joins that slow down queries automatically from all future
AccessibleIfRelatedRowsAreAccessible
logic.