Move path functions to CTE

Explanation

We can materialize the results of the shortest path function, which saves computation time if the function is referenced multiple times in the same query. The trade-off is that we must connect the source and destination IDs within the Common Table Expression (CTE) to the main part of the query when referencing other columns. In the query below, we want to return the first and last names for both the source and destination. These columns are referenced in the main query, while the shortest paths for these pairs are computed in the CTE. We "connect" these by performing a join on the source rowid and destination rowid.

Shortest Path CTE Query

WITH shortest_path_p AS MATERIALIZED (
    SELECT shortestpath(0, (SELECT count(*) FROM Person), p1.rowid, p2.rowid) AS path, 
           p1.rowid AS src_id, 
           p2.rowid AS dst_id
    FROM Person p1
    JOIN Person p2 ON p1.rowid <> p2.rowid
    CROSS JOIN (SELECT count(cte1.temp) * 0 AS temp FROM cte1) __x
    WHERE p1.firstName = 'John' AND __x.temp * 0 = 0
)
SELECT path, 
       p1.firstName || ' ' || p1.lastName AS src_name, 
       p2.firstName || ' ' || p2.lastName AS dst_name,
       len(path) // 2,
       path[1:-:2] AS vertices,
       path[2:-:2] AS edges
FROM shortest_path_p
JOIN Person p1 ON p1.rowid = src_id
JOIN Person p2 ON p2.rowid = dst_id
WHERE p1.firstName = 'John';

Performance Comparison

Initial Results for the Worst Case Scenario

Running a query starting from any person named 'John' with no lower or upper limit on the number of hops.

Old Method:

Run Time: real 678.943 s, user 1253.974703 s, sys 38.162676 s
Rows Returned: 2,614,930

CTE Method:

Run Time: real 163.173 s, user 310.194336 s, sys 9.514430 s
Rows Returned: 2,614,930

Initial results for Best Case Scenario

Only referencing the shortest path function once shows that the performance is equal between the old and new method. The shortest path UDF dominates the running time.

Old Method:

Run Time (s): real 171.237 s, user 153.170838 s, sys 9.748276 s
Rows Returned: 2,614,930

CTE Method:

Run Time (s): real 165.812 s, user 151.718586 s, sys 8.991195 s
Rows Returned: 2,614,930

Queries

Old Method Query

WITH cte1 AS (
    SELECT CREATE_CSR_EDGE(
        0,
        (SELECT count(p.id) FROM Person p),
        CAST (
            (SELECT sum(CREATE_CSR_VERTEX(
                        0,
                        (SELECT count(p.id) FROM Person p),
                        sub.dense_id,
                        sub.cnt)
                        )
            FROM (
                SELECT p.rowid AS dense_id, count(knows.Person1Id) AS cnt
                FROM Person p
                LEFT JOIN Person_knows_Person knows ON knows.Person1Id = p.id
                GROUP BY p.rowid) sub
            )
        AS BIGINT),
        p1.rowid,
        p2.rowid,
        knows.rowid
    ) AS temp
    FROM Person_knows_Person knows
    JOIN Person p1 ON p1.id = knows.Person1Id
    JOIN Person p2 ON p2.id = knows.Person2Id
)
SELECT shortestpath(0, (SELECT count(*) FROM Person), p1.rowid, p2.rowid) AS path,
       p1.firstName || ' ' || p1.lastName AS src_name, 
       p2.firstName || ' ' || p2.lastName AS dst_name,
       len(shortestpath(0, (SELECT count(*) FROM Person), p1.rowid, p2.rowid)) // 2 AS path_length,
       shortestpath(0, (SELECT count(*) FROM Person), p1.rowid, p2.rowid)[1:-:2] AS vertices,
       shortestpath(0, (SELECT count(*) FROM Person), p1.rowid, p2.rowid)[2:-:2] AS edges
FROM Person p1, Person p2, (SELECT count(cte1.temp) * 0 AS temp FROM cte1) __x
WHERE p1.firstName = 'John' AND __x.temp * 0 = 0;

CTE Method Query

WITH cte1 AS (
    SELECT CREATE_CSR_EDGE(
        0,
        (SELECT count(p.id) FROM Person p),
        CAST (
            (SELECT sum(CREATE_CSR_VERTEX(
                        0,
                        (SELECT count(p.id) FROM Person p),
                        sub.dense_id,
                        sub.cnt)
                        )
            FROM (
                SELECT p.rowid AS dense_id, count(knows.Person1Id) AS cnt
                FROM Person p
                LEFT JOIN Person_knows_Person knows ON knows.Person1Id = p.id
                GROUP BY p.rowid) sub
            )
        AS BIGINT),
        p1.rowid,
        p2.rowid,
        knows.rowid
    ) AS temp
    FROM Person_knows_Person knows
    JOIN Person p1 ON p1.id = knows.Person1Id
    JOIN Person p2 ON p2.id = knows.Person2Id
), shortest_path_p AS MATERIALIZED (
    SELECT shortestpath(0, (SELECT count(*) FROM Person), p1.rowid, p2.rowid) AS path, 
           p1.rowid AS src_id, 
           p2.rowid AS dst_id
    FROM Person p1
    JOIN Person p2 ON p1.rowid <> p2.rowid
    CROSS JOIN (SELECT count(cte1.temp) * 0 AS temp FROM cte1) __x
    WHERE p1.firstName = 'John' AND __x.temp * 0 = 0
)
SELECT path, 
       p1.firstName || ' ' || p1.lastName AS src_name, 
       p2.firstName || ' ' || p2.lastName AS dst_name,
       len(path) // 2,
       path[1:-:2] AS vertices,
       path[2:-:2] AS edges
FROM shortest_path_p
JOIN Person p1 ON p1.rowid = src_id
JOIN Person p2 ON p2.rowid = dst_id
WHERE p1.firstName = 'John';

cwida / duckpgq-extension