QutEcoacoustics / baw-server

The acoustic workbench server for storing and managing ecoacoustic data. Manages the structure and audio data. Provides an API for clients access.
Apache License 2.0
9 stars 4 forks source link

filter query is hugely inefficient for very large datasets #671

Open atruskie opened 7 months ago

atruskie commented 7 months ago

Ok, we're hitting a major bottleneck. One of our projects has 298,477 audio_recordings in it (%17 of the entire collection!). And suddenly our assumptions about which queries are efficient enough are no longer valid image

When a bunch of these filter queries hit the server, their normal 8 second execution time blows out to 255+ seconds. The server locks up and the workbench becomes unresponsive.

image

(100% pegged CPU is database server)

The query at fault has an average execution time of 7 to 8 seconds and is:

SELECT "audio_recordings"."id",
    "audio_recordings"."site_id",
    "audio_recordings"."recorded_date",
    "audio_recordings"."media_type",
    "sites"."name" AS "sites.name"
FROM "audio_recordings"
INNER JOIN "sites" "sites_audio_recordings" ON ("sites_audio_recordings"."deleted_at" IS NULL)
AND ("sites_audio_recordings"."id" = "audio_recordings"."site_id")
LEFT OUTER JOIN "sites" ON "audio_recordings"."site_id" = "sites"."id"
WHERE ("audio_recordings"."deleted_at" IS NULL)
    AND (EXISTS
                            (SELECT 1
                                FROM "projects_sites"
                                INNER JOIN "projects" ON "projects_sites"."project_id" = "projects"."id"
                                WHERE "projects_sites"."site_id" = "sites"."id"))
    AND ("audio_recordings"."id" IN
                            (SELECT "audio_recordings"."id"
                                FROM "audio_recordings"
                                LEFT OUTER JOIN "sites" ON "audio_recordings"."site_id" = "sites"."id"
                                LEFT OUTER JOIN "projects_sites" ON "sites"."id" = "projects_sites"."site_id"
                                LEFT OUTER JOIN "projects" ON "projects_sites"."project_id" = "projects"."id"
                                WHERE "projects"."id" = 1167))
    AND ("audio_recordings"."created_at" < '2024-03-18T00:00:00')
ORDER BY "audio_recordings"."id" ASC
LIMIT 25
OFFSET 0

Rewriting the query to:

with filter_conditions AS (
    SELECT "audio_recordings"."id"
    FROM "audio_recordings"
        LEFT OUTER JOIN "sites" ON "audio_recordings"."site_id" = "sites"."id"
        LEFT OUTER JOIN "projects_sites" ON "sites"."id" = "projects_sites"."site_id"
        LEFT OUTER JOIN "projects" ON "projects_sites"."project_id" = "projects"."id"
    WHERE "projects"."id" = 1167
        AND (
            "audio_recordings"."created_at" < '2024-03-18T00:00:00'
        )
),
permissions AS (
    SELECT "audio_recordings"."id"
    FROM "audio_recordings"
        LEFT OUTER JOIN "sites" ON "audio_recordings"."site_id" = "sites"."id"
        LEFT OUTER JOIN "projects_sites" ON "sites"."id" = "projects_sites"."site_id"
        LEFT OUTER JOIN "projects" ON "projects_sites"."project_id" = "projects"."id"
    WHERE "projects"."id" = 1167 and 1=1
)

SELECT "audio_recordings"."id",
    "audio_recordings"."site_id",
    "audio_recordings"."recorded_date",
    "audio_recordings"."media_type",
    "sites"."name" AS "sites.name"
FROM "audio_recordings"
    INNER JOIN "sites" "sites_audio_recordings" ON ("sites_audio_recordings"."deleted_at" IS NULL)
    AND (
        "sites_audio_recordings"."id" = "audio_recordings"."site_id"
    )
    LEFT OUTER JOIN "sites" ON "audio_recordings"."site_id" = "sites"."id"
    LEFT OUTER JOIN "filter_conditions" ON "audio_recordings"."id" = "filter_conditions"."id"
    INNER JOIN "permissions" ON "permissions"."id" = "audio_recordings"."id"
WHERE ("audio_recordings"."deleted_at" IS NULL)
ORDER BY "audio_recordings"."id" ASC
LIMIT 25 OFFSET 0

Results in just 1.7 seconds of execution time.

So, by using CTEs we can see a 4✕ or more performance improvement. This should also help with execution time blowout when the server is overloaded.

We've tried throwing more CPU at the problem (double DB core count) and doing maintenance tasks (VACUUM, ANALYZE, reindexing) to little effect.