Listing archived workflows can be slow if you have a very large number of workflows (100,000+), or the average workflow size is high (100KB), even after the optimizations from #13566 and #13779. This makes some additional optimizations that speed up the queries by ~90% on MySQL and ~50% on PostgreSQL.
WITH workflows AS (
SELECT
"name",
"namespace",
"uid",
"phase",
"startedat",
"finishedat",
coalesce(workflow->'metadata', '{}') as metadata,
coalesce(workflow->'status', '{}') as status,
workflow->'spec'->>'suspend' as suspend
FROM "argo_archived_workflows"
WHERE (("clustername" = $1 AND "namespace" = $2 AND "instanceid" = $3))
ORDER BY "startedat" DESC
LIMIT 100
) (
SELECT
"name",
"namespace",
"uid",
"phase",
"startedat",
"finishedat",
coalesce(metadata->>'labels', '{}') as labels,
coalesce(metadata->>'annotations', '{}') as annotations,
coalesce(status->>'progress', '') as progress,
coalesce(metadata->>'creationTimestamp', '') as creationtimestamp,
"suspend",
coalesce(status->>'message', '') as message,
coalesce(status->>'estimatedDuration', '0') as estimatedduration,
coalesce(status->>'resourcesDuration', '{}') as resourcesduration
FROM "workflows"
)
For MySQL, the bottleneck was the optimizer inexplicably refusing to use the argo_archived_workflows_i4 index and instead using the primary key, which is much more expensive. As explained by @Danny5487401 in https://github.com/argoproj/argo-workflows/issues/13563#issuecomment-2339660938, two ways of solving that are using FORCE INDEX or adding a union index on (clustername, startedat). Using FORCE INDEX is slightly hacky, and adding a new index is needlessly wasteful when we already have argo_archived_workflows_i4, so I opted to modify that index to cover (clustername, startedat). The new query looks like this:
SELECT
`name`,
`namespace`,
`uid`,
`phase`,
`startedat`,
`finishedat`,
coalesce(workflow->'$.metadata.labels', '{}') as labels,
coalesce(workflow->'$.metadata.annotations', '{}') as annotations,
coalesce(workflow->>'$.status.progress', '') as progress,
coalesce(workflow->>'$.metadata.creationTimestamp', '') as creationtimestamp,
workflow->>'$.spec.suspend',
coalesce(workflow->>'$.status.message', '') as message,
coalesce(workflow->>'$.status.estimatedDuration', '0') as estimatedduration,
coalesce(workflow->'$.status.resourcesDuration', '{}') as resourcesduration
FROM `argo_archived_workflows`
WHERE ((`clustername` = ? AND `namespace` = ? AND `instanceid` = ?))
ORDER BY `startedat` DESC
LIMIT 100
Fixes #13601
Motivation
Listing archived workflows can be slow if you have a very large number of workflows (100,000+), or the average workflow size is high (100KB), even after the optimizations from #13566 and #13779. This makes some additional optimizations that speed up the queries by ~90% on MySQL and ~50% on PostgreSQL.
Modifications
The bottleneck for these queries depended on whether you use MySQL or PostgreSQL, each of which required a different fix. For PostgreSQL, the bottleneck was detoasting overhead, as explained in https://github.com/argoproj/argo-workflows/issues/13601#issuecomment-2420499551. The fix was to use a common table expression to reduce the amount of times
workflow
needs to be detoasted, as suggested by @kodieg in https://github.com/argoproj/argo-workflows/issues/13601#issuecomment-2421794871. The new query looks like this:For MySQL, the bottleneck was the optimizer inexplicably refusing to use the
argo_archived_workflows_i4
index and instead using the primary key, which is much more expensive. As explained by @Danny5487401 in https://github.com/argoproj/argo-workflows/issues/13563#issuecomment-2339660938, two ways of solving that are usingFORCE INDEX
or adding a union index on(clustername, startedat)
. UsingFORCE INDEX
is slightly hacky, and adding a new index is needlessly wasteful when we already haveargo_archived_workflows_i4
, so I opted to modify that index to cover(clustername, startedat)
. The new query looks like this:Verification
First, I used https://github.com/argoproj/argo-workflows/pull/13715 to generate 100,000 randomized workflows, with https://gist.github.com/MasonM/52932ff6644c3c0ccea9e847780bfd90 as a template:
time go run ./hack/db fake-archived-workflows --template "@very-large-workflow.yaml" --rows 100000
time go run ./hack/db fake-archived-workflows --template "@very-large-workflow.yaml" --rows 100000 -d 'mysql:password@tcp/argo'
Then, I ran
make BenchmarkWorkflowArchive
once on themain
branch and once on this branch (with migration applied), and used benchstat to compare:PostgreSQL
MySQL