apache / age

Graph database optimized for fast analysis and real-time data processing. It is provided as an extension to PostgreSQL.
https://age.apache.org
Apache License 2.0
3.11k stars 410 forks source link

Why does the cypher query takes longer than normal SQL query? #1555

Closed MatheusFarias03 closed 9 months ago

MatheusFarias03 commented 9 months ago

I have a graph with two vertex labels: Wholesaler and Product; and with one edge label: OFFERS.

There are 3426 vertices with label Product, 4 vertices with label Wholesaler, and 13326 edges with label OFFERS.

The below queries shows what are the properties of Product and OFFERS (The names of the vertices are all in Portuguese).

SELECT * FROM cypher('TestGraph', $$
    MATCH (P:Product)
    RETURN P
$$) AS (product agtype);
                                                                                                           product                                                                                                            
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 {"id": 2814749767109987, "label": "Product", "properties": {"name": "La Vita Salada Verão", "type": "Salad", "description": "1 X 200 g"}}::vertex
 {"id": 2814749767109988, "label": "Product", "properties": {"name": "La Vita Salada Baby Summer Gourmet", "type": "Salad", "description": "1 X 100 g"}}::vertex
 {"id": 2814749767109989, "label": "Product", "properties": {"name": "Daucy Salada Verão Congelada", "type": "Salad", "description": "1 X 300 g"}}::vertex
 {"id": 2814749767109990, "label": "Product", "properties": {"name": "Hiromi Salada Americana", "type": "Salad", "description": "1 X 170 g"}}::vertex
 {"id": 2814749767109991, "label": "Product", "properties": {"name": "Mix Tropical Higienizado Oba Bem Querer", "type": "Salad", "description": "1 X 160 g"}}::vertex

(...)
SELECT * FROM cypher('TestGraph', $$
MATCH ()-[E:OFFERS]->()
RETURN E
$$) AS (offer agtype);

                                                                                          offer                                                                                           
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 {"id": 2251799813698730, "label": "OFFERS", "end_id": 1970324836975960, "start_id": 844424930132000, "properties": {"price": "R$ 44,00"}}::edge
 {"id": 2251799813698731, "label": "OFFERS", "end_id": 1970324836975961, "start_id": 844424930132000, "properties": {"price": "R$ 42,00"}}::edge
 {"id": 2251799813698732, "label": "OFFERS", "end_id": 1970324836975962, "start_id": 844424930132000, "properties": {"price": "R$ 38,00"}}::edge
 {"id": 2251799813698733, "label": "OFFERS", "end_id": 1970324836975963, "start_id": 844424930132000, "properties": {"price": "R$ 38,00"}}::edge

(...)

The following queries are focused on retrieving information about product offerings from wholesalers, filtering for products that contain the word 'Vegano' ( 'Vegan' in Portuguese ) in the name.

-- With Cypher syntax.
WITH graph_query as (
    SELECT * FROM cypher('TestGraph', $$
        MATCH ()-[E:OFFERS]->(P:Product)
        RETURN P.name, E.price ORDER BY P.name, E.price
    $$) AS (product agtype, price agtype)
) 
SELECT * FROM graph_query 
WHERE graph_query.product::text LIKE '%Vegano%';

-- Time: 173.787 ms

-- With SQL syntax only.
SELECT o.id as offer_id, 
       w.properties->>'name' as wholesaler_name, 
       p.properties->>'name' as product_name, 
       o.properties->>'price' as product_price
FROM "TestGraph"."OFFERS" o
JOIN "TestGraph"."Wholesaler" w ON o.start_id = w.id
JOIN "TestGraph"."Product" p ON o.end_id = p.id
WHERE p.properties->>'name' LIKE '%Vegano%';

-- Time: 24.168 ms

Although I find a bit more understandable with the cypher syntax, the SQL syntax returns the same result 7.19 times faster.

The query plan for the SQL syntax only is:

                                   QUERY PLAN                                    
---------------------------------------------------------------------------------
 Hash Join  (cost=110.49..478.78 rows=1 width=104)
   Hash Cond: (o.start_id = w.id)
   ->  Hash Join  (cost=109.40..477.66 rows=3 width=168)
         Hash Cond: (o.end_id = p.id)
         ->  Seq Scan on "OFFERS" o  (cost=0.00..318.26 rows=13326 width=80)
         ->  Hash  (cost=109.39..109.39 rows=1 width=104)
               ->  Seq Scan on "Product" p  (cost=0.00..109.39 rows=1 width=104)
                     Filter: ((properties ->> 'name'::text) ~~ '%Vegano%'::text)
   ->  Hash  (cost=1.04..1.04 rows=4 width=104)
         ->  Seq Scan on "Wholesaler" w  (cost=0.00..1.04 rows=4 width=104)

And the query plan for the Cypher syntax is:

                               QUERY PLAN                               
------------------------------------------------------------------------
 CTE Scan on graph_query  (cost=10.00..35.00 rows=1 width=64)
   Filter: ((product)::text ~~ '%Vegano%'::text)
   CTE graph_query
     ->  Function Scan on cypher  (cost=0.00..10.00 rows=1000 width=64)

I'm building a website with AGE for my final project in college and I wanted to better understand why the SQL query is faster than the one using Cypher.

jrgemignani commented 9 months ago

@MatheusFarias03 To find the correct query plan for Cypher, prefix the Cypher command, inside the cypher() function call with EXPLAIN. Something like -

SELECT * FROM cypher('xyz', $$ EXPLAIN MATCH (u) RETURN u $$) AS (node agtype)

jrgemignani commented 9 months ago

@MatheusFarias03 I'm not sure why you want to use this query. You are essentially calling cypher to find the items and then using SQL to get the ones with 'Vegano' in the name.

-- With Cypher syntax.
WITH graph_query as (
    SELECT * FROM cypher('TestGraph', $$
        MATCH ()-[E:OFFERS]->(P:Product)
        RETURN P.name, E.price ORDER BY P.name, E.price
    $$) AS (product agtype, price agtype)
) 
SELECT * FROM graph_query 
WHERE graph_query.product::text LIKE '%Vegano%';

Something like the following should work just the same, but without the wrapping SQL -

SELECT * FROM cypher('TestGraph', $$
        MATCH ()-[E:OFFERS]->(P:Product)
        WHERE P.name =~ 'Vegano'
        RETURN P.name, E.price ORDER BY P.name, E.price
    $$) AS (product agtype, price agtype)
MatheusFarias03 commented 9 months ago

Thank you John! This last query is much better (it took the same time as the SQL only). I didn't know or remember using =~ before, that's why I used WHERE graph_query.product::text LIKE '%Vegano%';.

jrgemignani commented 9 months ago

@MatheusFarias03 Not a problem, it isn't a well known command. It is roughly related to LIKE but more powerful because it uses regex instead -

PG_FUNCTION_INFO_V1(age_eq_tilde);
/*
 * Execution function for =~ aka regular expression comparisons
 *
 * Note: Everything must resolve to 2 agtype strings. All others types are
 * errors.
 */
Datum age_eq_tilde(PG_FUNCTION_ARGS)

It calls the PG function textregexeq

jrgemignani commented 9 months ago

@MatheusFarias03 Btw, if this satisfactorily answers your question, consider closing the ticket :)

MatheusFarias03 commented 9 months ago

Sure! Thank you for the help John!