Open jelly opened 1 month ago
Can we just precompute it via cronjob every idk 20 min and serve the result? 😆
So some analysis, locally nothing is shown because I need to tweak the DEFAULT_CUTOFF
to DEFAULT_CUTOFF = timedelta(hours=100)
in mirrors/utils.py`
SELECT "mirrors_mirrorlog"."url_id",
"mirrors_mirrorlog"."error",
COUNT("mirrors_mirrorlog"."error") AS "error_count",
MAX("mirrors_mirrorlog"."check_time") AS "last_occurred"
FROM "mirrors_mirrorlog"
INNER JOIN "mirrors_mirrorurl"
ON ("mirrors_mirrorlog"."url_id" = "mirrors_mirrorurl"."id")
INNER JOIN "mirrors_mirror"
ON ("mirrors_mirrorurl"."mirror_id" = "mirrors_mirror"."id")
WHERE ("mirrors_mirrorlog"."check_time" >= '2024-07-18T14:12:56.565672+00:00'::timestamptz AND NOT "mirrors_mirrorlog"."is_success" AND "mirrors_mirror"."public" AND "mirrors_mirrorurl"."active" AND "mirrors_mirror"."active" AND "mirrors_mirror"."public")
GROUP BY "mirrors_mirrorlog"."url_id",
"mirrors_mirrorlog"."error"
ORDER BY 4 DESC,
3 DESC
This query locally takes 500 ms(!), on our live env the page consistenly loads in ~ 2 seconds.
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=525187.86..525234.16 rows=18518 width=31)
Sort Key: (max(mirrors_mirrorlog.check_time)) DESC, (count(mirrors_mirrorlog.error)) DESC
-> GroupAggregate (cost=413.78..523875.25 rows=18518 width=31)
Group Key: mirrors_mirrorlog.url_id, mirrors_mirrorlog.error
-> Incremental Sort (cost=413.78..523504.89 rows=18518 width=23)
Sort Key: mirrors_mirrorlog.url_id, mirrors_mirrorlog.error
Presorted Key: mirrors_mirrorlog.url_id
-> Nested Loop (cost=1.13..522889.88 rows=18518 width=23)
-> Nested Loop (cost=0.56..889.93 rows=1027 width=4)
-> Index Scan using mirrors_mirrorurl_pkey on mirrors_mirrorurl (cost=0.28..310.97 rows=1685 width=8)
Filter: active
-> Memoize (cost=0.29..0.75 rows=1 width=4)
Cache Key: mirrors_mirrorurl.mirror_id
Cache Mode: logical
-> Index Scan using mirrors_mirror_pkey on mirrors_mirror (cost=0.28..0.74 rows=1 width=4)
Index Cond: (id = mirrors_mirrorurl.mirror_id)
Filter: (public AND active AND public)
-> Index Scan using mirrors_mirrorlog_url_id_check_time on mirrors_mirrorlog (cost=0.56..508.02 rows=26 width=23)
Index Cond: ((url_id = mirrors_mirrorurl.id) AND (check_time >= '2024-07-18 16:12:56.565672+02'::timestamp with time zone))
Filter: (NOT is_success)
JIT:
Functions: 24
Options: Inlining true, Optimization true, Expressions true, Deforming true
(23 rows)
This is get_mirror_errors
which is not cached by us and thus explains why the page loads slowly.
get_mirror_statuses
is also slow ~ 300 ms locally:
archweb=> explain SELECT l.url_id,
u.mirror_id,
COUNT(l.id) AS check_count,
COUNT(l.last_sync) AS success_count,
MAX(l.last_sync) AS last_sync,
MAX(l.check_time) AS last_check,
AVG(l.duration) AS duration_avg,
STDDEV(l.duration) AS duration_stddev,
AVG(check_time - last_sync) AS delay
FROM mirrors_mirrorlog l
JOIN mirrors_mirrorurl u
ON u.id = l.url_id
WHERE l.check_time >= '2024-07-18T14:12:56.193089+00:00'::timestamptz
GROUP BY l.url_id,
u.mirror_id;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=617361.15..623734.47 rows=192402 width=72)
Group Key: l.url_id, u.mirror_id
Planned Partitions: 8
-> Nested Loop (cost=0.85..597158.94 rows=192402 width=36)
-> Index Scan using mirrors_mirrorlog_check_time on mirrors_mirrorlog l (cost=0.56..591390.08 rows=192402 width=32)
Index Cond: (check_time >= '2024-07-18 16:12:56.193089+02'::timestamp with time zone)
-> Memoize (cost=0.29..0.77 rows=1 width=8)
Cache Key: l.url_id
Cache Mode: logical
-> Index Scan using mirrors_mirrorurl_pkey on mirrors_mirrorurl u (cost=0.28..0.76 rows=1 width=8)
Index Cond: (id = l.url_id)
JIT:
Functions: 14
Options: Inlining true, Optimization true, Expressions true, Deforming true
(14 rows)
Applied a workaround for now in 6edcaaa474f26db79b7434c3d0ccce461b0bd89a
have you tried
SELECT "mirrors_mirrorlog"."url_id",
"mirrors_mirrorlog"."error",
COUNT("mirrors_mirrorlog"."error") AS "error_count",
MAX("mirrors_mirrorlog"."check_time") AS "last_occurred"
FROM "mirrors_mirrorlog"
INNER JOIN "mirrors_mirrorurl"
ON ("mirrors_mirrorlog"."url_id" = "mirrors_mirrorurl"."id" AND "mirrors_mirrorurl"."active")
INNER JOIN "mirrors_mirror"
ON ("mirrors_mirrorurl"."mirror_id" = "mirrors_mirror"."id" AND "mirrors_mirror"."public" AND "mirrors_mirror"."active")
WHERE "mirrors_mirrorlog"."check_time" >= '2024-07-18T14:12:56.565672+00:00'::timestamptz
AND NOT "mirrors_mirrorlog"."is_success"
GROUP BY "mirrors_mirrorlog"."url_id",
"mirrors_mirrorlog"."error"
ORDER BY 4 DESC,
3 DESC
filtering public and active earlier should help, fingers crossed
Loading /mirrors/status/ takes ~ 2 seconds on our production env. this needs some optimisation.