apache / age

Graph database optimized for fast analysis and real-time data processing. It is provided as an extension to PostgreSQL.
https://age.apache.org
Apache License 2.0
3k stars 403 forks source link

Can we improve query performance? #1902

Open sanket-uptycs opened 3 months ago

sanket-uptycs commented 3 months ago

I am trying following query. Can we improve query performance?

SELECT * FROM cypher('graphdb', $$
     MATCH (n1:node1 {SubscriptionId: "8da31d20-daf9-42ad-bf7f-2cdcf6290001"})
     MATCH (n2:node2 {SubscriptionId: "8da31d20-daf9-42ad-bf7f-2cdcf6290001"})
     UNWIND n1.ArrayOfStrings as props
     WITH props, n1, n2
     WHERE toLower(props) = toLower(n2.ResourceId)
     MERGE (n1{SubscriptionId: "8da31d20-daf9-42ad-bf7f-2cdcf6290001"})-[r:ATTACHED_TO{SubscriptionId: "8da31d20-daf9-42ad-bf7f-2cdcf6290001"}]->(n2{SubscriptionId: "8da31d20-daf9-42ad-bf7f-2
 cdcf6290001"})
     SET r.batchId = 1717068002, r.SubscriptionId = "8da31d20-daf9-42ad-bf7f-2cdcf6290001"
 $$) as (a agtype);

I have gin index on node1 node2 and ATTACHED_TO edge. But it is taking Planning Time: 0.309 ms Execution Time: 44548.615 ms Here is the query plan.

 Custom Scan (Cypher Set)  (cost=0.00..0.00 rows=0 width=32) (actual time=44537.778..44537.782 rows=0 loops=1)
   ->  Subquery Scan on cypher  (cost=0.00..0.00 rows=1 width=32) (actual time=8.018..43411.094 rows=3000 loops=1)
         ->  Custom Scan (Cypher Merge)  (cost=0.00..0.00 rows=0 width=256) (actual time=8.017..43410.376 rows=3000 loops=1)
               ->  Subquery Scan on _age_default_alias_previous_cypher_clause  (cost=180.03..4370.67 rows=810 width=64) (actual time=8.014..43408.357 rows=3000 loops=1)
                     ->  Hash Left Join  (cost=180.03..4362.57 rows=810 width=160) (actual time=8.014..43407.518 rows=3000 loops=1)
                           Join Filter: ((age_properties(_age_default_alias_previous_cypher_clause_1.n1) @> '{"SubscriptionId": "8da31d20-daf9-42ad-bf7f-2cdcf6290001"}'::agtype) AND (age_properties(_age_default_alias_previous_cypher_clause_1.n2) @> '{"SubscriptionId": "8da31d20-daf9-42ad-bf7f-2cdcf6290001"}'::agtype))
                           ->  Subquery Scan on _age_default_alias_previous_cypher_clause_1  (cost=48.21..4206.43 rows=810 width=96) (actual time=0.900..43278.062 rows=3000 loops=1)
                                 Filter: (age_tolower(_age_default_alias_previous_cypher_clause_1.props) = age_tolower(agtype_access_operator(VARIADIC ARRAY[_age_default_alias_previous_cypher_clause_1.n2, '"ResourceId"'::agtype])))
                                 Rows Removed by Filter: 8997000
                                 ->  ProjectSet  (cost=48.21..966.43 rows=162000 width=96) (actual time=0.556..17782.350 rows=9000000 loops=1)
                                       ->  Nested Loop  (cost=48.21..152.38 rows=162 width=2796) (actual time=0.536..454.720 rows=4500000 loops=1)
                                             ->  Bitmap Heap Scan on "node2" n2  (cost=28.14..95.85 rows=18 width=1060) (actual time=0.349..4.000 rows=3000 loops=1)
                                                   Recheck Cond: (properties @> '{"SubscriptionId": "8da31d20-daf9-42ad-bf7f-2cdcf6290001"}'::agtype)
                                                   Heap Blocks: exact=436
                                                   ->  Bitmap Index Scan on index_name2  (cost=0.00..28.14 rows=18 width=0) (actual time=0.305..0.305 rows=3000 loops=1)
                                                         Index Cond: (properties @> '{"SubscriptionId": "8da31d20-daf9-42ad-bf7f-2cdcf6290001"}'::agtype)
                                             ->  Materialize  (cost=20.07..54.52 rows=9 width=1736) (actual time=0.000..0.057 rows=1500 loops=3000)
                                                   ->  Bitmap Heap Scan on "node1" n1  (cost=20.07..54.48 rows=9 width=1736) (actual time=0.183..1.288 rows=1500 loops=1)
                                                         Recheck Cond: (properties @> '{"SubscriptionId": "8da31d20-daf9-42ad-bf7f-2cdcf6290001"}'::agtype)
                                                         Heap Blocks: exact=383
                                                         ->  Bitmap Index Scan on index_name  (cost=0.00..20.07 rows=9 width=0) (actual time=0.149..0.149 rows=1500 loops=1)
                                                               Index Cond: (properties @> '{"SubscriptionId": "8da31d20-daf9-42ad-bf7f-2cdcf6290001"}'::agtype)
                           ->  Hash  (cost=131.40..131.40 rows=28 width=48) (actual time=7.053..7.054 rows=4600 loops=1)
                                 Buckets: 8192 (originally 1024)  Batches: 1 (originally 1)  Memory Usage: 1358kB
                                 ->  Bitmap Heap Scan on "ATTACHED_TO" r  (cost=36.21..131.40 rows=28 width=48) (actual time=0.617..6.157 rows=4600 loops=1)
                                       Recheck Cond: (properties @> '{"SubscriptionId": "8da31d20-daf9-42ad-bf7f-2cdcf6290001"}'::agtype)
                                       Heap Blocks: exact=442
                                       ->  Bitmap Index Scan on index_name3  (cost=0.00..36.21 rows=28 width=0) (actual time=0.553..0.553 rows=4665 loops=1)
                                             Index Cond: (properties @> '{"SubscriptionId": "8da31d20-daf9-42ad-bf7f-2cdcf6290001"}'::agtype)

select count(*) from graphdb._ag_label_edge; | 324030 |

select count(*) from graphdb._ag_label_vertex; | 220385 | Number of Node and Edge types : 50+

github-actions[bot] commented 1 month ago

This issue is stale because it has been open 60 days with no activity. Remove "Abondoned" label or comment or this will be closed in 14 days.

github-actions[bot] commented 1 month ago

This issue was closed because it has been stalled for further 14 days with no activity.

jrgemignani commented 4 weeks ago

@sanket-uptycs We are looking into performance improvements in general.

I will point out that the below query is a bit redundant by using MERGE and SET. MERGE is basically MATCH X and if X doesn't exist CREATE X. Since you have already MATCHed them, you don't need to do it again. Additionally, since you are creating a new edge, you don't need to SET it.

SELECT * FROM cypher('graphdb', $$
     MATCH (n1:node1 {SubscriptionId: "8da31d20-daf9-42ad-bf7f-2cdcf6290001"})
     MATCH (n2:node2 {SubscriptionId: "8da31d20-daf9-42ad-bf7f-2cdcf6290001"})
     UNWIND n1.ArrayOfStrings as props
     WITH props, n1, n2
     WHERE toLower(props) = toLower(n2.ResourceId)
     MERGE (n1{SubscriptionId: "8da31d20-daf9-42ad-bf7f-2cdcf6290001"})-[r:ATTACHED_TO{SubscriptionId: "8da31d20-daf9-42ad-bf7f-2cdcf6290001"}]->(n2{SubscriptionId: "8da31d20-daf9-42ad-bf7f-2
 cdcf6290001"})
     SET r.batchId = 1717068002, r.SubscriptionId = "8da31d20-daf9-42ad-bf7f-2cdcf6290001"
 $$) as (a agtype);

I believe this is equivalent, and likely faster -

SELECT * FROM cypher('graphdb', $$
     MATCH (n1:node1 {SubscriptionId: "8da31d20-daf9-42ad-bf7f-2cdcf6290001"})
     MATCH (n2:node2 {SubscriptionId: "8da31d20-daf9-42ad-bf7f-2cdcf6290001"})
     UNWIND n1.ArrayOfStrings as props
     WITH props, n1, n2
     WHERE toLower(props) = toLower(n2.ResourceId)
     CREATE (n1)-[r:ATTACHED_TO{SubscriptionId: "8da31d20-daf9-42ad-bf7f-2cdcf6290001", batchId: 1717068002}]->(n2)
 $$) as (a agtype);