Closed Kobaruon closed 4 months ago
@Kobaruon How long a query takes, or is expected to take, depends on the dataset, the query, and machine it is running on. This particular query, while looking simple, searches for a match regardless of direction -[]-
these are notoriously slow queries.
@jrgemignani it takes around 3 minutes to run, we expect around 1 second execution time. DB vm has 8 cores 32GB memory, but the data size is small ~600mb.
We have a very similar query, same table structure and same indexes. On that table, query below runs under 1 second.
SELECT * from cypher('passv2', $$
MATCH (V)-[R:TCP]-(V2)
RETURN V,R,V2 LIMIT 100
$$) as (V agtype, R agtype, V2 agtype);
I mean yes without LIMIT it takes ages to run, but with LIMIT we expect fast execution.
Edge label counts don't know if that matters.
@Kobaruon It would be very nice if you could share Explain Analyze for query 2, the fast one. And postgres version :) Based on reading first plan time was spent here: Gather (cost=1000.57..414641535.58 rows=1372 width=96) (actual time=180465.329..182428.494 rows=100 loops=1) I am looking at what this can actually mean and what options available to tune this aspect of query, if possible.
The first relevant article hinted at disabling this, but with caution, that it should only applied locally to specific query. Similar, but slightly different take on forcing single thread for the query here. Please note, this may not be actually a fix, and needs to be tested first, before attempting in production environment.
@MironAtHome I am using postgresql 16.3
SELECT * from cypher('passv2', $$
MATCH (V)-[R:TCP]-(V2)
RETURN V,R,V2 LIMIT 100
$$) as (V agtype, R agtype, V2 agtype);
I tried your recommendation and it seems to make the query faster, however the query itself is already faster without making any changes for some reason. It's still slower than what we expected, but it's a great improvement.
Original: Takes 19.8sec https://explain.dalibo.com/plan/dhe464cb1579b734
Tweaked: 2.7sec
BEGIN;
SET LOCAL parallel_leader_participation = off;
explain analyze SELECT * from cypher('passv2', $$
MATCH (V)-[R:WORKED_ON]-(V2)
RETURN V,R,V2
LIMIT 100
$$) as (V agtype, R agtype, V2 agtype);
Please see additional link, forcing single threaded execution. And thank you for sharing feedback.
Sorry completely missed that, but it seems to increase execution time.
SET max_parallel_workers_per_gather = 0;
https://explain.dalibo.com/plan/8a9475c9329f762f
Ok, looks like experiment hints that SET LOCAL parallel_leader_participation = off;
is the better performing option.
One thing that isn't 100% clear to me is the bi-directional query design.
May I ask you to run queries like this, and if both produce identical results, than we can further tune query by using directional attribute in the MATCH, else, what improvement is available seems reasonable.
However, it needs to be confirmed ( only on test instance of the server ).
I am not sure how to clean - up buffers ont he running server. On my servers I am allowed to perform any administrative action, so, I would just restart. But please see if this is something doable where you are. Usually RDBMS has commands to clean up memory buffers to run query "anew", as if it was never run before.
Once performance gains confirmed, it counts. Give me just a moment to write queries, I will update same comment.
Please see if these two queries produce identical output.
Query 1:
SELECT * from cypher('passv2', $$
MATCH (V)-[R:WORKED_ON]->(V2)
RETURN V,R,V2
ORDER BY id(V) ASC, id(R) ASC, id(V2) ASC
LIMIT 100
$$) as (V agtype, R agtype, V2 agtype);
Query 2:
SELECT * from cypher('passv2', $$
MATCH (V)<-[R:WORKED_ON]-(V2)
RETURN V,R,V2
ORDER BY id(V) ASC, id(R) ASC, id(V2) ASC
LIMIT 100
$$) as (V agtype, R agtype, V2 agtype);
Seeing row counts askew in the query plan, depending on direction of relationship, provoked my curiosity, so, I would like to confirm.
I restarted the server and tried again with SET LOCAL parallel_leader_participation = off;
and the results are same. So, it clearly works.
If you give any direction query runs much much faster and yes seems like these two produce idential output. Query 1: https://explain.dalibo.com/plan/gc622ba38g91afb1
Since 100 rows do produce identical output, I would suggest running without limit ( I would expect to produce also, identical output ) once. And than apply directional attribute, either way, permanently, as part of query design. One more thing to check, and this is for virtual machine, rather than Postgres RDBMS engine or query tuning. Please see what memory configuration is applied on virtual machine running Postgres RDMBS. This isn't really an OS. It's purely VM configuration. The feature to check is memory allocation. There are two main settings for this feature
SET LOCAL parallel_leader_participation = off;
is friendly to this VM feature.
Please note, I am not suggesting to modify memory allocation policy, as this may, potentially, impact other VM's running on the same host, and this is something that has, potentially, wide impact, as RDBMS are very greedy to memory RAM resource. However, it should help to understand as to why some queries may run slower, particular, what causes the Gatherer process wait. As it needs certain amount of memory resource to gather rows together and the determination, how much memory it needs is driven by rather arcane rules of RDBMS engine ( in this case Postgres ) that may, potentially, ask for more than just number of bytes, rows would strictly fit into. What we did with tuning option applied, is we promoted, what is called, an streaming approach, where RDBMS tries to do away with as little resources of memory RAM as possible, just enough to first copy rows out from the row reading component of RDBMS out to row producing component.
If directional attribute proves working, it further promotes saving of resources.@Kobaruon As I mentioned earlier -[]-
, directionless, is a lot more intensive. You are trying to match any edge that has any connection to the currently selected vertices; directed edges will always run faster.
While you may have a similar query that runs faster, and is also directionless, the runtime depends on the total number of vertices and edges that are selected and how connected they are. It could be that the faster query involves a less connected set than the slower one. Just a thought without seeing the actual data.
Edit: I should mention that the match logic is likely inefficient for directionless edges as AGE only builds digraphs. This is something our team should look into, in the future, given time.
I restarted the server and tried again with
SET LOCAL parallel_leader_participation = off;
and the results are same. So, it clearly works.If you give any direction query runs much much faster and yes seems like these two produce idential output. Query 1: https://explain.dalibo.com/plan/gc622ba38g91afb1
After studying closely behavior of operators ()-[]-() and ()-[]->() it is evident, that data will be presented as following: If, for example, one is to create two vertices v:Person {name:'Alice'} v:Person {name:'Bob'} and follows by creating two edges Alice-[:knows]->Bob and Alice<-[:knows]-Bob with the id's assigned as following: Alice-id:1 Bob-id:2 Edge :knows for the relationship Alice-[:knows]->Bob id 3 Edge :knows for the relationship Alice<-[:knows]-Bob id 4 Query Match (v1)-[:knows]-(v2) Will return the following combination of id's Row #1: 1,3,2 Row #2: 1,4,2 Row #3: 2,3,1 Row #4: 2,4,1 However, if we were to present not id's ( which is really an system values used by machine internals ) and produce just the distinct combination of names
Match (v1)-[:knows]-(v2) WITH v1.name as Name1, v2.name as Name2 RETURN DISTINCT Name1, Name2 ORDER BY Name1 ASC, Name2 ASC
We will get back Alice, Bob Bob, Alice
Now, if we were to issue query (v1)-[:knows]->(v2) For the id producing query we will get only half the rows Row #1: 1,3,2 Row #2: 2,4,1
However, for the name producing query, there would be no difference.
In other words, (v1)-[:knows]-(v2)
Behaves as (v1)-[:knows]->(v2) UNION (v1)<-[:knows]-(v2)
To summarise, looking at the last query just one line above, it's 2x the number of rows stored, with the same relationship listed twice, except switching the sides of the data.
In case producing rows once with relationship going left to right is acceptable, query can be expressed as
(v1)-[:
Hope this helps.
Thanks everyone for the quick help and detailed analysis.
This simple query takes about 3 minutes to run. Is this normal behaviour? Is there a room for improvement here in terms of indexing or rewriting the query?
Explain Analyze
``` QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=1000.57..30222613.91 rows=100 width=96) (actual time=180465.331..182428.507 rows=100 loops=1) Buffers: shared hit=202943877 -> Gather (cost=1000.57..414641535.58 rows=1372 width=96) (actual time=180465.329..182428.494 rows=100 loops=1) Workers Planned: 2 Workers Launched: 2 Buffers: shared hit=202943877 -> Nested Loop (cost=0.57..414640398.38 rows=572 width=96) (actual time=61699.042..92998.472 rows=53 loops=3) Buffers: shared hit=202943877 -> Nested Loop (cost=0.00..24546776.28 rows=224054050 width=472) (actual time=0.022..9394.517 rows=16767051 loops=3) Buffers: shared hit=1738966 -> Parallel Append (cost=0.00..1355.14 rows=9663 width=236) (actual time=0.008..0.928 rows=724 loops=3) Buffers: shared hit=49 -> Parallel Seq Scan on pod "V_6" (cost=0.00..997.72 rows=10791 width=257) (actual time=0.005..0.501 rows=681 loops=1) Buffers: shared hit=26 -> Parallel Seq Scan on mars "V_3" (cost=0.00..165.77 rows=1992 width=199) (actual time=0.007..0.020 rows=18 loops=2) Buffers: shared hit=2 -> Parallel Seq Scan on "DNE" "V_2" (cost=0.00..40.30 rows=810 width=52) (actual time=0.004..1.122 rows=1377 loops=1) Buffers: shared hit=16 -> Parallel Seq Scan on earth "V_5" (cost=0.00..2.78 rows=26 width=162) (actual time=0.002..0.034 rows=44 loops=1) Buffers: shared hit=2 -> Parallel Seq Scan on venus "V_4" (cost=0.00..1.55 rows=18 width=220) (actual time=0.003..0.026 rows=31 loops=1) Buffers: shared hit=1 -> Parallel Seq Scan on "euCentral01" "V_7" (cost=0.00..1.07 rows=2 width=59) (actual time=0.001..0.003 rows=4 loops=1) Buffers: shared hit=1 -> Parallel Seq Scan on _ag_label_vertex "V_1" (cost=0.00..1.02 rows=1 width=35) (actual time=0.003..0.004 rows=1 loops=1) Buffers: shared hit=1 -> Parallel Seq Scan on host "V_8" (cost=0.00..0.00 rows=1 width=79) (actual time=0.001..0.001 rows=0 loops=1) -> Append (cost=0.00..1844.47 rows=23189 width=236) (actual time=0.002..7.452 rows=23148 loops=2173) Buffers: shared hit=1738917 -> Seq Scan on _ag_label_vertex "V2_1" (cost=0.00..1.03 rows=1 width=35) (actual time=0.002..0.002 rows=1 loops=2173) Buffers: shared hit=2173 -> Seq Scan on "DNE" "V2_2" (cost=0.00..57.31 rows=1377 width=52) (actual time=0.002..0.191 rows=1377 loops=2173) Buffers: shared hit=34768 -> Seq Scan on mars "V2_3" (cost=0.00..207.61 rows=3387 width=199) (actual time=0.003..0.576 rows=3386 loops=2173) Buffers: shared hit=230245 -> Seq Scan on venus "V2_4" (cost=0.00..1.93 rows=31 width=220) (actual time=0.004..0.007 rows=31 loops=2172) Buffers: shared hit=2172 -> Seq Scan on earth "V2_5" (cost=0.00..3.32 rows=44 width=162) (actual time=0.002..0.008 rows=44 loops=2172) Buffers: shared hit=4344 -> Seq Scan on pod "V2_6" (cost=0.00..1224.32 rows=18344 width=257) (actual time=0.001..3.281 rows=18314 loops=2172) Buffers: shared hit=1463045 -> Seq Scan on "euCentral01" "V2_7" (cost=0.00..1.12 rows=4 width=59) (actual time=0.002..0.002 rows=4 loops=2170) Buffers: shared hit=2170 -> Seq Scan on host "V2_8" (cost=0.00..0.00 rows=1 width=79) (actual time=0.000..0.000 rows=0 loops=2170) -> Bitmap Heap Scan on "WORKED_ON" "R" (cost=0.57..1.71 rows=1 width=51) (actual time=0.003..0.003 rows=0 loops=50301154) Recheck Cond: (((start_id = "V".id) AND (end_id = "V2".id)) OR ((start_id = "V2".id) AND (end_id = "V".id))) Heap Blocks: exact=1 Buffers: shared hit=201204731 -> BitmapOr (cost=0.57..0.57 rows=1 width=0) (actual time=0.003..0.003 rows=0 loops=50301154) Buffers: shared hit=201204624 -> Bitmap Index Scan on "WORKED_ON_start_id_end_id_idx" (cost=0.00..0.29 rows=1 width=0) (actual time=0.001..0.001 rows=0 loops=50301154) Index Cond: ((start_id = "V".id) AND (end_id = "V2".id)) Buffers: shared hit=100602316 -> Bitmap Index Scan on "WORKED_ON_start_id_end_id_idx" (cost=0.00..0.29 rows=1 width=0) (actual time=0.001..0.001 rows=0 loops=50301154) Index Cond: ((start_id = "V2".id) AND (end_id = "V".id)) Buffers: shared hit=100602308 Planning: Buffers: shared hit=6 Planning Time: 0.694 ms Execution Time: 182428.614 ms (60 rows) ```