Open agilgur5 opened 2 months ago
I tested fetching list of workflows using a new version v3.5.11, but still observed quite large times (~20s). We're using PostgreSQL database.
I tested converting json to jsonb and observed times approximately 9s.
Finally, tested generated columns variant proposed here and it worked superfast.
Is it possible to pursue optimization with generated columns for postgres?
You should try the union index as written above from https://github.com/argoproj/argo-workflows/issues/13563#issuecomment-2339660938.
Your attempts suggest that your DB engine is for some reason still parsing all JSON instead of the single row. What version of Postgres are you on?
The generated columns can cause a very long migration time per https://github.com/argoproj/argo-workflows/issues/13295#issuecomment-2258657197 and require newer Postgres, so are not suitable for a patch upgrade. They can also slow down certain queries as well as they increase the size of the table and rows. Lastly, they are unsustainable in the long run per https://github.com/argoproj/argo-workflows/issues/13295#issuecomment-2335027616 as we'd have to keep adding more columns as users want to filter or query more.
Generated columns are also a band-aid and don't solve the root cause of the performance, which is that the DB is extracting JSON in every row, despite it not being used in the WHERE
filter. The subquery and different indices actually attempt to fix the root cause
I already tested this with union index added.
We're using PostgreSQL 12.4.
Maybe my usecase is a bit different. We're fetching information about workflows using /api/v1/workflows/argo-workflows
So we want to fetch multiple rows (not a single row). This can take over long seconds even if we want to fetch 1000 of workflows. In this case with limiting and almost no filtering, parsing jsons is slow. Even query like fetch all data is slow. Even migration to jsonb helps a lot.
Maybe a way to improve current situation would be to migrate workflow column to jsonb type or add a new jsonb column with its contents limited to what we want to be available for filtering and fetching in this query (because this workflow jsons are huge).
I already tested this with union index added.
Can you clarify the performance of that? You didn't mention it in your first comment.
This can take over long seconds even if we want to fetch 1000 of workflows.
Oh, well if you're intentionally not paginating, that's rather unsurprising that it's slow. Then the indexing is not the main issue but the actual retrieval. Unpaginated queries weren't the original purpose of this issue, though I suppose that too is a 3.5 regression due to the JSON retrieval
Even migration to jsonb helps a lot.
JSONB is only supported by Postgres, but we could migrate Postgres to that if the migration time is minimal. You'd have to test that and then PR for the migration and query differences, for Postgres only
or add a new jsonb column with its contents limited to what we want to be available for filtering and fetching in this query
This will have similar sustainability issues as the fields stored will change over time, but this causes unexpected problems in the future as well as adding fields will only add to new rows and not old rows. So that approach seems quite suboptimal.
I tested times on small postgres instance. However, I think it will be faster than migration from text to json that already may be applied.
For 100k of workflows it took 96 seconds (on pretty small and slow instance).
Currently, we're cleaning up this table every few hours leaving us without history.
Currently, we're cleaning up this table every few hours leaving us without history.
You should just use API requests with smaller limits and keep your history.
I mentioned in the contributor meeting I'd look into this. To evaluate these kind of optimizations, you need a ton of rows in argo_archived_workflows
, but there doesn't seem to be a way of generating that with the dev environment. Have you all been using real-world DB dumps? I don't have access to a dump with that kind of data, unfortunately.
I think there should be a script (probably under hack/
) to generate dummy data and insert that into the DB. The schema for argo_archived_workflows
is pretty straightforward, so that should be easy without introducing any new tools/libraries. I can work on that, unless anybody has any better ideas.
I don't have better ideas - I agree that would be useful though.
To evaluate these kind of optimizations, you need a ton of rows in
argo_archived_workflows
, but there doesn't seem to be a way of generating that with the dev environment
Indeed that is a challenge of testing this and one reason why higher scale performance regressions are hard to catch early -- both dev and test would need to operate with such large numbers.
Have you all been using real-world DB dumps?
No, we've actually been relying on user feedback to test much of the time.
I think there should be a script (probably under
hack/
) to generate dummy data and insert that into the DB.
Yea you can probably get this working with faker
or similar tooling
so that should be easy without introducing any new tools/libraries
The JSON blob is the primary column needing an optimization, and its schema is the entire CR, so that part would be difficult without tooling 😅
Okay, I entered a draft PR with a simple CLI tool to populate the DB with randomly-generated workflows: https://github.com/argoproj/argo-workflows/pull/13715
Unfortunately, I can't reproduce the performance issues reported in https://github.com/argoproj/argo-workflows/pull/13566. @ryancurrah said he had 93,135 workflows in https://github.com/argoproj/argo-workflows/pull/13566#issuecomment-2346779218, so I used the tool to populate the DB with 100,000 randomly-generated workflows.
$ go run ./hack/db/main.go fake-archived-workflows --rows 100000 --seed 2600415997737113470
Using seed 2600415997737113470
Clusters: [cx5j6rrzm5vnkfqqr4d]
Namespaces: [bs98pxsfhs9z v95848 hq76xbj4kq7vrdp49vm ghzj6vcrth 262wb8w2b8v2wd2p2p9]
Inserted 100000 rows
Every workflow is 1466 bytes ~5900 bytes (edited) bytes after storage and associated with the same cluster ("cx5j6rrzm5vnkfqqr4d"). The tool generated 5 namespaces that are randomly assigned to each workflow (i.e. each namespace has ~20,000 workflows associated with it). Then, I tried executing a simple variation of the "before" query from that issue:
but it consistently executes in <2ms:
Limit (cost=1.13..60.41 rows=1 width=318) (actual time=1.067..1.068 rows=1 loops=1)
-> Nested Loop (cost=1.13..100778.36 rows=1700 width=318) (actual time=1.065..1.066 rows=1 loops=1)
Join Filter: ((argo_archived_workflows_labels.uid)::text = (argo_archived_workflows_labels_1.uid)::text)
-> Nested Loop (cost=0.71..89644.36 rows=3364 width=1617) (actual time=0.071..0.071 rows=1 loops=1)
-> Index Scan Backward using argo_archived_workflows_i4 on argo_archived_workflows (cost=0.29..24325.29 rows=19917 width=1580) (actual time=0.026..0.027 rows=1 loops=1)
Filter: (((clustername)::text = 'cx5j6rrzm5vnkfqqr4d'::text) AND ((instanceid)::text = ''::text) AND ((namespace)::text = 'bs98pxsfhs9z'::text))
Rows Removed by Filter: 2
-> Index Scan using argo_archived_workflows_labels_pkey on argo_archived_workflows_labels (cost=0.42..3.28 rows=1 width=57) (actual time=0.040..0.040 rows=1 loops=1)
Index Cond: (((clustername)::text = 'cx5j6rrzm5vnkfqqr4d'::text) AND ((uid)::text = (argo_archived_workflows.uid)::text) AND ((name)::text = 'workflows.argoproj.io/phase'::text))
Filter: ((value)::text = 'Succeeded'::text)
-> Index Scan using argo_archived_workflows_labels_pkey on argo_archived_workflows_labels argo_archived_workflows_labels_1 (cost=0.42..3.28 rows=1 width=57) (actual time=0.016..0.016 rows=1 loops=1)
Index Cond: (((clustername)::text = 'cx5j6rrzm5vnkfqqr4d'::text) AND ((uid)::text = (argo_archived_workflows.uid)::text) AND ((name)::text = 'workflows.argoproj.io/workflow-archiving-status'::text))
Filter: ((value)::text = 'Persisted'::text)
Planning Time: 1.916 ms
Execution Time: 1.226 ms
The query plan shows it's doing an index scan on the 19,917 rows associated with the namespace bs98pxsfhs9z
, which is what it should be doing. I'm not sure why @ryancurrah got such different results. Maybe the average size of the workflows in his DB is very large? If so, that would indicate TOAST issues. You can use this query to check:
select avg(octet_length(workflow::text)), avg(pg_column_size(workflow)) from argo_archived_workflows;
Results for me:
avg | avg
-----------------------+-----------------------
5907.8400000000000000 | 1488.8100000000000000
Every workflow is ~1466 bytes
Maybe the average size of the workflows in his DB is larger than 2KB?
Yea I wouldn't be surprised by that, after including the status
, workflows can be very large. See also node status offloading and #7121
I edited my previous comment because I think I was wrong about the workflow size being ~1466 bytes. That was the value returned by select avg(pg_column_size(workflow)) from argo_archived_workflows
, which I assumed including TOAST'd values, but it appears it doesn't (relevant StackOverflow). The actual size is ~5900 bytes per select avg(octet_length(workflow::text)) from argo_archived_workflows
These 100'000 workflows were from a single namespace. I calculated average size of workflow column and max as well:
argo=# select avg(octet_length(workflow::text)) from argo_archived_workflows;
avg
--------------------
18599.709432882786
argo=# select max(octet_length(workflow::text)) from argo_archived_workflows;
max
-------
65875
Also explain indicates that you are retrieving a single row. It works instantly as well for me, but retrieving 1000 of rows takes much more time
With JSONB it takes less than 5 seconds
Even though, fetching single row takes 60ms
Below execution plan for fetching a single row:
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=1.12..693.65 rows=1 width=343) (actual time=1.253..1.254 rows=1 loops=1)
-> Nested Loop (cost=1.12..65098.89 rows=94 width=343) (actual time=1.252..1.253 rows=1 loops=1)
Join Filter: ((argo_archived_workflows_labels.uid)::text = (argo_archived_workflows_labels_1.uid)::text)
-> Nested Loop (cost=0.71..60559.85 rows=1096 width=182) (actual time=0.039..0.039 rows=1 loops=1)
-> Index Scan Backward using argo_archived_workflows_i4 on argo_archived_workflows (cost=0.29..2133.01 rows=14172 width=145) (actual time=0.018..0.018 rows=1 loops=1)
Filter: (((clustername)::text = 'default'::text) AND ((instanceid)::text = ''::text) AND ((namespace)::text = 'argo-workflows'::text))
-> Index Scan using argo_archived_workflows_labels_pkey on argo_archived_workflows_labels (cost=0.42..4.12 rows=1 width=45) (actual time=0.018..0.018 rows=1 loops=1)
Index Cond: (((clustername)::text = 'default'::text) AND ((uid)::text = (argo_archived_workflows.uid)::text) AND ((name)::text = 'workflows.argoproj.io/phase'::text))
Filter: ((value)::text = 'Succeeded'::text)
-> Index Scan using argo_archived_workflows_labels_pkey on argo_archived_workflows_labels argo_archived_workflows_labels_1 (cost=0.42..4.12 rows=1 width=45) (actual time=0.008..0.008 rows=1 loops=1)
Index Cond: (((clustername)::text = 'default'::text) AND ((uid)::text = (argo_archived_workflows.uid)::text) AND ((name)::text = 'workflows.argoproj.io/workflow-archiving-status'::text))
Filter: ((value)::text = 'Persisted'::text)
Planning Time: 0.981 ms
Execution Time: 1.316 ms
(14 rows)
Time: 58.798 ms
And for fetching 1000 rows
Limit (cost=5133.53..5133.77 rows=94 width=343) (actual time=58700.688..58700.913 rows=1000 loops=1)
-> Sort (cost=5133.53..5133.77 rows=94 width=343) (actual time=58700.687..58700.843 rows=1000 loops=1)
Sort Key: argo_archived_workflows.startedat DESC
Sort Method: top-N heapsort Memory: 2080kB
-> Nested Loop (cost=2490.79..5130.45 rows=94 width=343) (actual time=29.798..58665.875 rows=13214 loops=1)
Join Filter: ((argo_archived_workflows_labels.uid)::text = (argo_archived_workflows.uid)::text)
-> Hash Join (cost=2490.38..4741.70 rows=94 width=90) (actual time=18.732..67.268 rows=13214 loops=1)
Hash Cond: ((argo_archived_workflows_labels_1.uid)::text = (argo_archived_workflows_labels.uid)::text)
-> Bitmap Heap Scan on argo_archived_workflows_labels argo_archived_workflows_labels_1 (cost=180.87..2429.01 rows=1215 width=45) (actual time=3.345..27.844 rows=13983 loops=1)
Recheck Cond: (((name)::text = 'workflows.argoproj.io/workflow-archiving-status'::text) AND ((value)::text = 'Persisted'::text))
Filter: ((clustername)::text = 'default'::text)
Heap Blocks: exact=2321
-> Bitmap Index Scan on argo_archived_workflows_labels_i1 (cost=0.00..180.57 rows=1215 width=0) (actual time=2.986..2.986 rows=14183 loops=1)
Index Cond: (((name)::text = 'workflows.argoproj.io/workflow-archiving-status'::text) AND ((value)::text = 'Persisted'::text))
-> Hash (cost=2295.81..2295.81 rows=1096 width=45) (actual time=15.378..15.379 rows=13214 loops=1)
Buckets: 16384 (originally 2048) Batches: 1 (originally 1) Memory Usage: 1122kB
-> Bitmap Heap Scan on argo_archived_workflows_labels (cost=163.65..2295.81 rows=1096 width=45) (actual time=3.275..11.281 rows=13214 loops=1)
Recheck Cond: (((name)::text = 'workflows.argoproj.io/phase'::text) AND ((value)::text = 'Succeeded'::text))
Filter: ((clustername)::text = 'default'::text)
Heap Blocks: exact=2283
-> Bitmap Index Scan on argo_archived_workflows_labels_i1 (cost=0.00..163.38 rows=1096 width=0) (actual time=2.924..2.924 rows=13414 loops=1)
Index Cond: (((name)::text = 'workflows.argoproj.io/phase'::text) AND ((value)::text = 'Succeeded'::text))
-> Index Scan using argo_archived_workflows_pkey on argo_archived_workflows (cost=0.41..4.05 rows=1 width=145) (actual time=0.011..0.011 rows=1 loops=13214)
Index Cond: (((clustername)::text = 'default'::text) AND ((uid)::text = (argo_archived_workflows_labels_1.uid)::text))
Filter: (((instanceid)::text = ''::text) AND ((namespace)::text = 'argo-workflows'::text))
Planning Time: 0.959 ms
Execution Time: 58701.536 ms
(27 rows)
Time: 58733.573 ms (00:58.734)
I starded to dig a little bit more and found a single row that takes that long to fetch if I do it with some offset.
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=5133.34..5133.34 rows=1 width=365) (actual time=60255.876..60255.880 rows=1 loops=1)
-> Sort (cost=5133.31..5133.55 rows=94 width=365) (actual time=60255.872..60255.877 rows=12 loops=1)
Sort Key: argo_archived_workflows.startedat DESC
Sort Method: top-N heapsort Memory: 50kB
-> Nested Loop (cost=2490.79..5131.16 rows=94 width=365) (actual time=29.639..60230.904 rows=13029 loops=1)
Join Filter: ((argo_archived_workflows_labels.uid)::text = (argo_archived_workflows.uid)::text)
-> Hash Join (cost=2490.38..4741.70 rows=94 width=90) (actual time=19.018..63.536 rows=13029 loops=1)
Hash Cond: ((argo_archived_workflows_labels_1.uid)::text = (argo_archived_workflows_labels.uid)::text)
-> Bitmap Heap Scan on argo_archived_workflows_labels argo_archived_workflows_labels_1 (cost=180.87..2429.01 rows=1215 width=45) (actual time=3.701..26.502 rows=13798 loops=1)
Recheck Cond: (((name)::text = 'workflows.argoproj.io/workflow-archiving-status'::text) AND ((value)::text = 'Persisted'::text))
Filter: ((clustername)::text = 'default'::text)
Heap Blocks: exact=2291
-> Bitmap Index Scan on argo_archived_workflows_labels_i1 (cost=0.00..180.57 rows=1215 width=0) (actual time=3.346..3.347 rows=14198 loops=1)
Index Cond: (((name)::text = 'workflows.argoproj.io/workflow-archiving-status'::text) AND ((value)::text = 'Persisted'::text))
-> Hash (cost=2295.81..2295.81 rows=1096 width=45) (actual time=15.308..15.309 rows=13029 loops=1)
Buckets: 16384 (originally 2048) Batches: 1 (originally 1) Memory Usage: 1108kB
-> Bitmap Heap Scan on argo_archived_workflows_labels (cost=163.65..2295.81 rows=1096 width=45) (actual time=3.407..11.517 rows=13029 loops=1)
Recheck Cond: (((name)::text = 'workflows.argoproj.io/phase'::text) AND ((value)::text = 'Succeeded'::text))
Filter: ((clustername)::text = 'default'::text)
Heap Blocks: exact=2253
-> Bitmap Index Scan on argo_archived_workflows_labels_i1 (cost=0.00..163.38 rows=1096 width=0) (actual time=2.962..2.963 rows=13429 loops=1)
Index Cond: (((name)::text = 'workflows.argoproj.io/phase'::text) AND ((value)::text = 'Succeeded'::text))
-> Index Scan using argo_archived_workflows_pkey on argo_archived_workflows (cost=0.41..4.05 rows=1 width=145) (actual time=0.010..0.010 rows=1 loops=13029)
Index Cond: (((clustername)::text = 'default'::text) AND ((uid)::text = (argo_archived_workflows_labels_1.uid)::text))
Filter: (((instanceid)::text = ''::text) AND ((namespace)::text = 'argo-workflows'::text))
Planning Time: 0.977 ms
Execution Time: 60255.951 ms
(27 rows)
Time: 60343.720 ms (01:00.344)
I noticed that you used all query (and I did it as well with EXISTS) that supposedly parses all JSON. I tested with new query
JSON version takes 5 seconds
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Hash Semi Join (cost=4276.07..6182.55 rows=1000 width=375) (actual time=47.982..5625.164 rows=1000 loops=1)
Hash Cond: ((argo_archived_workflows.uid)::text = ("ANY_subquery".uid)::text)
-> Seq Scan on argo_archived_workflows (cost=0.00..1778.15 rows=14172 width=137) (actual time=0.008..7.680 rows=13812 loops=1)
Filter: ((clustername)::text = 'default'::text)
-> Hash (cost=4263.57..4263.57 rows=1000 width=37) (actual time=36.108..36.112 rows=1000 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 76kB
-> Subquery Scan on "ANY_subquery" (cost=4251.07..4263.57 rows=1000 width=37) (actual time=35.608..35.879 rows=1000 loops=1)
-> Limit (cost=4251.07..4253.57 rows=1000 width=45) (actual time=35.607..35.738 rows=1000 loops=1)
-> Sort (cost=4251.07..4253.81 rows=1096 width=45) (actual time=35.606..35.666 rows=1000 loops=1)
Sort Key: argo_archived_workflows_1.startedat DESC
Sort Method: top-N heapsort Memory: 205kB
-> Hash Join (cost=2309.51..4195.73 rows=1096 width=45) (actual time=15.081..28.178 rows=13043 loops=1)
Hash Cond: ((argo_archived_workflows_1.uid)::text = (argo_archived_workflows_labels.uid)::text)
-> Seq Scan on argo_archived_workflows argo_archived_workflows_1 (cost=0.00..1849.01 rows=14172 width=53) (actual time=0.003..7.618 rows=13812 loops=1)
Filter: (((clustername)::text = 'default'::text) AND ((instanceid)::text = ''::text) AND ((namespace)::text = 'argo-workflows'::text))
-> Hash (cost=2295.81..2295.81 rows=1096 width=45) (actual time=15.071..15.072 rows=13043 loops=1)
Buckets: 16384 (originally 2048) Batches: 1 (originally 1) Memory Usage: 1109kB
-> Bitmap Heap Scan on argo_archived_workflows_labels (cost=163.65..2295.81 rows=1096 width=45) (actual time=3.226..11.272 rows=13043 loops=1)
Recheck Cond: (((name)::text = 'workflows.argoproj.io/phase'::text) AND ((value)::text = 'Succeeded'::text))
Filter: ((clustername)::text = 'default'::text)
Heap Blocks: exact=2255
-> Bitmap Index Scan on argo_archived_workflows_labels_i1 (cost=0.00..163.38 rows=1096 width=0) (actual time=2.883..2.883 rows=13443 loops=1)
Index Cond: (((name)::text = 'workflows.argoproj.io/phase'::text) AND ((value)::text = 'Succeeded'::text))
Planning Time: 0.641 ms
Execution Time: 5625.530 ms
(25 rows)
Time: 5714.274 ms (00:05.714)
while JSONB take 500ms
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Hash Semi Join (cost=4276.07..6142.55 rows=1000 width=375) (actual time=36.683..494.291 rows=1000 loops=1)
Hash Cond: ((argo_archived_workflows.uid)::text = ("ANY_subquery".uid)::text)
-> Seq Scan on argo_archived_workflows (cost=0.00..1778.15 rows=14172 width=137) (actual time=0.008..5.076 rows=13812 loops=1)
Filter: ((clustername)::text = 'default'::text)
-> Hash (cost=4263.57..4263.57 rows=1000 width=37) (actual time=35.711..35.716 rows=1000 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 76kB
-> Subquery Scan on "ANY_subquery" (cost=4251.07..4263.57 rows=1000 width=37) (actual time=35.226..35.483 rows=1000 loops=1)
-> Limit (cost=4251.07..4253.57 rows=1000 width=45) (actual time=35.225..35.355 rows=1000 loops=1)
-> Sort (cost=4251.07..4253.81 rows=1096 width=45) (actual time=35.224..35.284 rows=1000 loops=1)
Sort Key: argo_archived_workflows_1.startedat DESC
Sort Method: top-N heapsort Memory: 205kB
-> Hash Join (cost=2309.51..4195.73 rows=1096 width=45) (actual time=14.685..27.776 rows=13043 loops=1)
Hash Cond: ((argo_archived_workflows_1.uid)::text = (argo_archived_workflows_labels.uid)::text)
-> Seq Scan on argo_archived_workflows argo_archived_workflows_1 (cost=0.00..1849.01 rows=14172 width=53) (actual time=0.003..7.707 rows=13812 loops=1)
Filter: (((clustername)::text = 'default'::text) AND ((instanceid)::text = ''::text) AND ((namespace)::text = 'argo-workflows'::text))
-> Hash (cost=2295.81..2295.81 rows=1096 width=45) (actual time=14.676..14.678 rows=13043 loops=1)
Buckets: 16384 (originally 2048) Batches: 1 (originally 1) Memory Usage: 1109kB
-> Bitmap Heap Scan on argo_archived_workflows_labels (cost=163.65..2295.81 rows=1096 width=45) (actual time=3.230..11.149 rows=13043 loops=1)
Recheck Cond: (((name)::text = 'workflows.argoproj.io/phase'::text) AND ((value)::text = 'Succeeded'::text))
Filter: ((clustername)::text = 'default'::text)
Heap Blocks: exact=2255
-> Bitmap Index Scan on argo_archived_workflows_labels_i1 (cost=0.00..163.38 rows=1096 width=0) (actual time=2.884..2.885 rows=13443 loops=1)
Index Cond: (((name)::text = 'workflows.argoproj.io/phase'::text) AND ((value)::text = 'Succeeded'::text))
Planning Time: 0.596 ms
Execution Time: 494.462 ms
(25 rows)
Time: 535.514 ms
I tested fetching 10, 100 and 1000 rows and times are increasing almost linearly, but JSONB version is consistently 10x faster.
@kodieg Thanks for the details!
Also explain indicates that you are retrieving a single row. It works instantly as well for me, but retrieving 1000 of rows takes much more time
The queries in the PR description of https://github.com/argoproj/argo-workflows/pull/13566 all had LIMIT 1
, and the screenshot shows the query taking 11495ms before the change. If it's nearly instant for you, that means you can't reproduce it either. Of course, there's still other performance issues here, and fixing those might end up fixing the original issue, but we can't be sure if we can't reproduce it.
In any case, I tried to replicate your setup by creating a large workflow to use as a template: https://gist.github.com/MasonM/15cfb3c4f497324ac67d1a1a334d65db
Then I ran the following command to insert 100,000 randomized workflows using that template, all of which are associated with a single namespace (bs98pxsfhs9z
):
$ time go run ./hack/db/main.go fake-archived-workflows --rows 100000 --namespaces 1 --seed 2600415997737113470 --template '@large-workflow.yaml'
Using seed 2600415997737113470
Clusters: [cx5j6rrzm5vnkfqqr4d]
Namespaces: [bs98pxsfhs9z]
Inserted 100000 rows
real 19m7.916s
user 2m47.567s
sys 0m48.239s
The average workflow size is now ~19KB:
postgres=# select avg(octet_length(workflow::text)), avg(pg_column_size(workflow)) from argo_archived_workflows;
avg | avg
--------------------+-----------------------
18900.846520000000 | 1707.9891100000000000
(1 row)
After that, I ran the query from my original comment (the "before" query), which completes in <4ms, so it still can't reproduce the original issue. If I change LIMIT 1
to LIMIT 1000
, the execution time jumps to ~450ms, which I'd consider to still be pretty decent, since it's rare someone would need to list 1000 workflows at once. However, the "new" query from https://github.com/argoproj/argo-workflows/pull/13566 is much slower for 1,000 rows, taking ~1250ms:
postgres=# EXPLAIN ANALYZE SELECT name,
postgres-# namespace,
postgres-# UID,
postgres-# phase,
postgres-# startedat,
postgres-# finishedat,
postgres-# coalesce((workflow::JSON)->'metadata'->>'labels', '{}') AS labels,
postgres-# coalesce((workflow::JSON)->'metadata'->>'annotations', '{}') AS annotations,
postgres-# coalesce((workflow::JSON)->'status'->>'progress', '') AS progress,
postgres-# coalesce((workflow::JSON)->'metadata'->>'creationTimestamp', '') AS creationtimestamp,
postgres-# (workflow::JSON)->'spec'->>'suspend' AS suspend,
postgres-# coalesce((workflow::JSON)->'status'->>'message', '') AS message,
postgres-# coalesce((workflow::JSON)->'status'->>'estimatedDuration', '0') AS estimatedduration,
postgres-# coalesce((workflow::JSON)->'status'->>'resourcesDuration', '{}') AS resourcesduration
postgres-# FROM "argo_archived_workflows"
postgres-# WHERE "clustername" = 'cx5j6rrzm5vnkfqqr4d'
postgres-# AND UID IN
postgres-# (SELECT UID
postgres(# FROM "argo_archived_workflows"
postgres(# WHERE (("clustername" = 'cx5j6rrzm5vnkfqqr4d'
postgres(# AND "instanceid" = '')
postgres(# AND "namespace" = 'bs98pxsfhs9z'
postgres(# AND EXISTS
postgres(# (SELECT 1
postgres(# FROM argo_archived_workflows_labels
postgres(# WHERE clustername = argo_archived_workflows.clustername
postgres(# AND UID = argo_archived_workflows.uid
postgres(# AND name = 'workflows.argoproj.io/phase'
postgres(# AND value = 'Succeeded')
postgres(# )
postgres(# ORDER BY "startedat" DESC
postgres(# LIMIT 1000);
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=9212.87..16488.55 rows=1000 width=349) (actual time=33.046..1213.150 rows=1000 loops=1)
-> HashAggregate (cost=9212.45..9222.45 rows=1000 width=37) (actual time=31.731..32.037 rows=1000 loops=1)
Group Key: ("ANY_subquery".uid)::text
-> Subquery Scan on "ANY_subquery" (cost=1000.74..9209.95 rows=1000 width=37) (actual time=11.299..31.039 rows=1000 loops=1)
-> Limit (cost=1000.74..9199.95 rows=1000 width=45) (actual time=11.293..30.882 rows=1000 loops=1)
-> Gather Merge (cost=1000.74..136353.38 rows=16508 width=45) (actual time=11.290..30.783 rows=1000 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Nested Loop (cost=0.71..133447.92 rows=6878 width=45) (actual time=0.222..27.746 rows=767 loops=3)
-> Parallel Index Scan Backward using argo_archived_workflows_i4 on argo_archived_workflows argo_archived_workflows_1 (cost=0.29..102561.01 rows=41667 width=65) (actual time=0.051..7.306 rows=2262 loops=3)
Filter: (((clustername)::text = 'cx5j6rrzm5vnkfqqr4d'::text) AND ((instanceid)::text = ''::text) AND ((namespace)::text = 'bs98pxsfhs9z'::text))
-> Index Scan using argo_archived_workflows_labels_pkey on argo_archived_workflows_labels (cost=0.42..0.74 rows=1 width=57) (actual time=0.009..0.009 rows=0 loops=6787)
Index Cond: (((clustername)::text = 'cx5j6rrzm5vnkfqqr4d'::text) AND ((uid)::text = (argo_archived_workflows_1.uid)::text) AND ((name)::text = 'workflows.argoproj.io/phase'::text))
Filter: ((value)::text = 'Succeeded'::text)
Rows Removed by Filter: 1
-> Index Scan using argo_archived_workflows_pkey on argo_archived_workflows (cost=0.42..7.29 rows=1 width=1801) (actual time=0.005..0.005 rows=1 loops=1000)
Index Cond: (((clustername)::text = 'cx5j6rrzm5vnkfqqr4d'::text) AND ((uid)::text = ("ANY_subquery".uid)::text))
Planning Time: 2.211 ms
Execution Time: 1213.592 ms
(19 rows)
My best guess is that the performance is highly label-dependent. Could you please run the following query so we can get some stats?
select count(*) total_labels, avg(labels.count) average_labels_per_workflow, avg(labels.avg) average_label_size from (select count(*), avg(octet_length(name)) from argo_archived_workflows_labels group by uid) labels;
My best guess is that the performance is highly label-dependent
all of which are associated with a single namespace (
bs98pxsfhs9z
):
The namespace is also possibly a confounding variable. The DB version is another (more optimizations, engines, etc in newer versions).
However, the "new" query from #13566 is much slower for 1,000 rows, taking ~1250ms:
Interesting. I'd expect it to lack some performance as it adds a subquery, but did not expect a contradictory result. That being said, the original goal of this issue (per its description) is to replace that with a union index anyway.
Also hopefully @ryancurrah will see this soon and add some reproducible color
The EXPLAIN
logs in the original issue might help as well.
In our deployment we only use one namespace for workflows, the workflows have around 10 labels, 7 of the labels are ones we added the rest are from argo.
Result of:
select avg(octet_length(workflow::text)), avg(pg_column_size(workflow)) from argo_archived_workflows;
"avg" "avg-2" 69368.494313104989 12034.487900223380
Result of:
select count(*) total_labels, avg(labels.count) average_labels_per_workflow, avg(labels.avg) average_label_size from (select count(*), avg(octet_length(name)) from argo_archived_workflows_labels group by uid) labels;
"total_labels" "average_labels_per_workflow" "average_label_size" 107440 12.6245904690990320 21.3524800539094153
Hope this helps.
Okay, I spent a lot of time looking into this, and I'm pretty sure the main problem is a TOAST issue (as I alluded to https://github.com/argoproj/argo-workflows/issues/13601#issuecomment-2395630467) introduced in https://github.com/argoproj/argo-workflows/pull/12912. That PR changed the query used to list workflows from select workflow from argo_archived_workflows...
to
select name, namespace, uid, phase, startedat, finishedat, coalesce((workflow::json)->'metadata'->>'labels', '{}') as labels, coalesce((workflow::json)->'metadata'->>'annotations', '{}') as annotations, coalesce((workflow::json)->'status'->>'progress', '') as progress from argo_archived_workflows ...
While this change reduces the amount of data that needs to be transferred, it still significantly reduces performance for users with a large number of workflows due to TOAST. The article Jsonb: few more stories about the performance has a good explanation:
At the first glance there is no significant differense between them, but this impression is wrong. In the first case data is detoasted just once, everything is fine. Unfortunately, in the second case data is detoasted twice for both keys. And it’s like that not only for jsonb, but for all composite data types in PostgreSQL. In general, it’s not a big issue, but if our document is quite big and we’re trying to extract too many keys, we can end up with huge overhead.
In this case, the workflow
column is being detoasted three times. If the average workflow is small, the performance impact is negligible, but it can become very high if you are storing 50KB+ workflows.
Reverting https://github.com/argoproj/argo-workflows/pull/12912 solves this, which I verified with the following steps:
Use https://github.com/argoproj/argo-workflows/pull/13715 to generate 100,000 randomized workflows, with https://gist.github.com/MasonM/52932ff6644c3c0ccea9e847780bfd90 as a template:
$ time go run ./hack/db fake-archived-workflows --template "@very-large-workflow.yaml" --rows 100000
Using seed 1935828722624432788
Clusters: [default]
Namespaces: [argo]
Inserted 100000 rows
real 18m35.316s
user 3m2.447s
sys 0m44.972s
make BenchmarkWorkflowArchive > postgres_before_10000_workflows.txt
make BenchmarkWorkflowArchive > postgres_revert_10000_workflows.txt
Compare results using benchstat:
$ benchstat postgres_before_10000_workflows.txt postgres_revert_10000_workflows.txt
goos: linux
goarch: amd64
pkg: github.com/argoproj/argo-workflows/v3/test/e2e
cpu: 12th Gen Intel(R) Core(TM) i5-12400
│ postgres_before_10000_workflows.txt │ postgres_revert_10000_workflows.txt │
│ sec/op │ sec/op vs base │
WorkflowArchive/ListWorkflows-12 184.47m ± ∞ ¹ 47.90m ± ∞ ¹ ~ (p=1.000 n=1) ²
WorkflowArchive/ListWorkflows_with_label_selector-12 184.23m ± ∞ ¹ 46.70m ± ∞ ¹ ~ (p=1.000 n=1) ²
WorkflowArchive/CountWorkflows-12 13.35m ± ∞ ¹ 12.99m ± ∞ ¹ ~ (p=1.000 n=1) ²
geomean 76.84m 30.74m -59.99%
¹ need >= 6 samples for confidence interval at level 0.95
² need >= 4 samples to detect a difference at alpha level 0.05
However, reverting https://github.com/argoproj/argo-workflows/pull/12912 could re-introduce the memory issues mentioned in that PR, and will probably worsen query performance for MySQL (though I haven't tested that).
Migrating the workflow
column to be of type jsonb
is the biggest thing I could identify to improve performance. It doesn't eliminate the detoast overhead, but JSON operators are so much faster on jsonb
that it doesn't matter. The only downside is it takes a long time to migrate if you have a lot of workflows. I'll enter a followup PR with more details, but I figured I'd get my findings out first. EDIT: Here's the PR: https://github.com/argoproj/argo-workflows/pull/13779
One option I found on stack overflow: https://stackoverflow.com/questions/65820380/improving-performance-of-postgres-jsonb-queries-combined-with-relational-queries
I did quick tests of few variations of queries
EXPLAIN ANALYZE
WITH tab as (SELECT name,
namespace,
UID,
phase,
startedat,
finishedat,
(workflow::jsonb)->'metadata' as metadata,
(workflow::jsonb)->'status' as status,
(workflow::jsonb)->'spec' as spec
FROM "argo_archived_workflows"
LIMIT 1000
)
SELECT namespace, UID, phase, startedat, finishedat,
coalesce(metadata->>'labels', '{}') AS labels,
coalesce(metadata->>'annotations', '{}') AS annotations,
coalesce(status->>'progress', '') AS progress,
coalesce(metadata->>'creationTimestamp', '') AS creationtimestamp,
spec->>'suspend' AS suspend,
coalesce(status->>'message', '') AS message,
coalesce(status->>'estimatedDuration', '0') AS estimatedduration,
coalesce(status->>'resourcesDuration', '{}') AS resourcesduration
FROM tab
and
EXPLAIN ANALYZE
WITH tab as (SELECT name,
namespace,
UID,
phase,
startedat,
finishedat,
(workflow::jsonb) as workflow
FROM "argo_archived_workflows"
LIMIT 1000
)
SELECT namespace, UID, phase, startedat, finishedat,
coalesce((workflow)->'metadata'->>'labels', '{}') AS labels,
coalesce((workflow)->'metadata'->>'annotations', '{}') AS annotations,
coalesce((workflow)->'status'->>'progress', '') AS progress,
coalesce((workflow)->'metadata'->>'creationTimestamp', '') AS creationtimestamp,
(workflow)->'spec'->>'suspend' AS suspend,
coalesce((workflow)->'status'->>'message', '') AS message,
coalesce((workflow)->'status'->>'estimatedDuration', '0') AS estimatedduration,
coalesce((workflow)->'status'->>'resourcesDuration', '{}') AS resourcesduration
FROM tab
First one worked better for jsonb column, second allowed to mitigate case of having json type.
@MasonM could you maybe add those queries to your benchmarks?
@kodieg Thanks! The first query (with the CTE) is indeed a lot faster when I test it on PostgreSQL (~40% with json
and ~60% with jsonb
). I can enter a PR with more details once https://github.com/argoproj/argo-workflows/pull/13767 is merged (trying to juggle all these branches is getting to be a bit of a hassle).
As promised, I entered https://github.com/argoproj/argo-workflows/pull/13819 with the optimization @kodieg found, as well as optimizations for MySQL.
Awesome work Mason 🚀 ❤️
And thanks @kodieg for helping find a more optimal query!
Pre-requisites
:latest
image tag (i.e.quay.io/argoproj/workflow-controller:latest
) and can confirm the issue still exists on:latest
. If not, I have explained why, in detail, in my description below.What happened? What did you expect to happen?
This is a follow-up to https://github.com/argoproj/argo-workflows/issues/13295 and https://github.com/argoproj/argo-workflows/pull/13566. We can further improve the query performance, seemingly by another order of magnitude, by adding a union index:
This can be added as a migration after this line in
migration.go
.We should also test the performance of this index with and without the subquery. If the subquery is no longer necessary, then we can revert #13566 along with this change
More context
Quoting myself in https://github.com/argoproj/argo-workflows/pull/13566#pullrequestreview-2301010281:
Quoting @Danny5487401 from https://github.com/argoproj/argo-workflows/issues/13563#issuecomment-2339660938
And then my response:
We should also add a simplified/minimal "performance regression test" to ensure that, well, the performance doesn't regress, per https://github.com/argoproj/argo-workflows/pull/13566#issuecomment-2336734101:
Version(s)
v3.5, v3.5.10, 729ac17
Paste a minimal workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.
Logs from the workflow controller
Logs from in your workflow's wait container