Ok, we're hitting a major bottleneck. One of our projects has 298,477 audio_recordings in it (%17 of the entire collection!). And suddenly our assumptions about which queries are efficient enough are no longer valid
When a bunch of these filter queries hit the server, their normal 8 second execution time blows out to 255+ seconds. The server locks up and the workbench becomes unresponsive.
(100% pegged CPU is database server)
The query at fault has an average execution time of 7 to 8 seconds and is:
SELECT "audio_recordings"."id",
"audio_recordings"."site_id",
"audio_recordings"."recorded_date",
"audio_recordings"."media_type",
"sites"."name" AS "sites.name"
FROM "audio_recordings"
INNER JOIN "sites" "sites_audio_recordings" ON ("sites_audio_recordings"."deleted_at" IS NULL)
AND ("sites_audio_recordings"."id" = "audio_recordings"."site_id")
LEFT OUTER JOIN "sites" ON "audio_recordings"."site_id" = "sites"."id"
WHERE ("audio_recordings"."deleted_at" IS NULL)
AND (EXISTS
(SELECT 1
FROM "projects_sites"
INNER JOIN "projects" ON "projects_sites"."project_id" = "projects"."id"
WHERE "projects_sites"."site_id" = "sites"."id"))
AND ("audio_recordings"."id" IN
(SELECT "audio_recordings"."id"
FROM "audio_recordings"
LEFT OUTER JOIN "sites" ON "audio_recordings"."site_id" = "sites"."id"
LEFT OUTER JOIN "projects_sites" ON "sites"."id" = "projects_sites"."site_id"
LEFT OUTER JOIN "projects" ON "projects_sites"."project_id" = "projects"."id"
WHERE "projects"."id" = 1167))
AND ("audio_recordings"."created_at" < '2024-03-18T00:00:00')
ORDER BY "audio_recordings"."id" ASC
LIMIT 25
OFFSET 0
Rewriting the query to:
with filter_conditions AS (
SELECT "audio_recordings"."id"
FROM "audio_recordings"
LEFT OUTER JOIN "sites" ON "audio_recordings"."site_id" = "sites"."id"
LEFT OUTER JOIN "projects_sites" ON "sites"."id" = "projects_sites"."site_id"
LEFT OUTER JOIN "projects" ON "projects_sites"."project_id" = "projects"."id"
WHERE "projects"."id" = 1167
AND (
"audio_recordings"."created_at" < '2024-03-18T00:00:00'
)
),
permissions AS (
SELECT "audio_recordings"."id"
FROM "audio_recordings"
LEFT OUTER JOIN "sites" ON "audio_recordings"."site_id" = "sites"."id"
LEFT OUTER JOIN "projects_sites" ON "sites"."id" = "projects_sites"."site_id"
LEFT OUTER JOIN "projects" ON "projects_sites"."project_id" = "projects"."id"
WHERE "projects"."id" = 1167 and 1=1
)
SELECT "audio_recordings"."id",
"audio_recordings"."site_id",
"audio_recordings"."recorded_date",
"audio_recordings"."media_type",
"sites"."name" AS "sites.name"
FROM "audio_recordings"
INNER JOIN "sites" "sites_audio_recordings" ON ("sites_audio_recordings"."deleted_at" IS NULL)
AND (
"sites_audio_recordings"."id" = "audio_recordings"."site_id"
)
LEFT OUTER JOIN "sites" ON "audio_recordings"."site_id" = "sites"."id"
LEFT OUTER JOIN "filter_conditions" ON "audio_recordings"."id" = "filter_conditions"."id"
INNER JOIN "permissions" ON "permissions"."id" = "audio_recordings"."id"
WHERE ("audio_recordings"."deleted_at" IS NULL)
ORDER BY "audio_recordings"."id" ASC
LIMIT 25 OFFSET 0
Results in just 1.7 seconds of execution time.
So, by using CTEs we can see a 4✕ or more performance improvement. This should also help with execution time blowout when the server is overloaded.
We've tried throwing more CPU at the problem (double DB core count) and doing maintenance tasks (VACUUM, ANALYZE, reindexing) to little effect.
Ok, we're hitting a major bottleneck. One of our projects has 298,477 audio_recordings in it (%17 of the entire collection!). And suddenly our assumptions about which queries are efficient enough are no longer valid
When a bunch of these filter queries hit the server, their normal 8 second execution time blows out to 255+ seconds. The server locks up and the workbench becomes unresponsive.
(100% pegged CPU is database server)
The query at fault has an average execution time of 7 to 8 seconds and is:
Rewriting the query to:
Results in just 1.7 seconds of execution time.
So, by using CTEs we can see a 4✕ or more performance improvement. This should also help with execution time blowout when the server is overloaded.
We've tried throwing more CPU at the problem (double DB core count) and doing maintenance tasks (VACUUM, ANALYZE, reindexing) to little effect.