argoproj / argo-workflows

Workflow Engine for Kubernetes
https://argo-workflows.readthedocs.io/
Apache License 2.0
15.11k stars 3.21k forks source link

fix: further optimize archive workflow listing. Fixes #13601 #13819

Open MasonM opened 3 weeks ago

MasonM commented 3 weeks ago

Fixes #13601

Motivation

Listing archived workflows can be slow if you have a very large number of workflows (100,000+), or the average workflow size is high (100KB), even after the optimizations from #13566 and #13779. This makes some additional optimizations that speed up the queries by ~90% on MySQL and ~50% on PostgreSQL.

Modifications

The bottleneck for these queries depended on whether you use MySQL or PostgreSQL, each of which required a different fix. For PostgreSQL, the bottleneck was detoasting overhead, as explained in https://github.com/argoproj/argo-workflows/issues/13601#issuecomment-2420499551. The fix was to use a common table expression to reduce the amount of times workflow needs to be detoasted, as suggested by @kodieg in https://github.com/argoproj/argo-workflows/issues/13601#issuecomment-2421794871. The new query looks like this:

WITH workflows AS (
  SELECT
    "name",
    "namespace",
    "uid",
    "phase",
    "startedat",
    "finishedat",
    coalesce(workflow->'metadata', '{}') as metadata,
    coalesce(workflow->'status', '{}') as status,
    workflow->'spec'->>'suspend' as suspend
  FROM "argo_archived_workflows"
  WHERE (("clustername" = $1 AND "namespace" = $2 AND "instanceid" = $3))
  ORDER BY "startedat" DESC
  LIMIT 100
) (
  SELECT
    "name",
    "namespace",
    "uid",
    "phase",
    "startedat",
    "finishedat",
    coalesce(metadata->>'labels', '{}') as labels,
    coalesce(metadata->>'annotations', '{}') as annotations,
    coalesce(status->>'progress', '') as progress,
    coalesce(metadata->>'creationTimestamp', '') as creationtimestamp,
    "suspend",
    coalesce(status->>'message', '') as message,
    coalesce(status->>'estimatedDuration', '0') as estimatedduration,
    coalesce(status->>'resourcesDuration', '{}') as resourcesduration
  FROM "workflows"
)

For MySQL, the bottleneck was the optimizer inexplicably refusing to use the argo_archived_workflows_i4 index and instead using the primary key, which is much more expensive. As explained by @Danny5487401 in https://github.com/argoproj/argo-workflows/issues/13563#issuecomment-2339660938, two ways of solving that are using FORCE INDEX or adding a union index on (clustername, startedat). Using FORCE INDEX is slightly hacky, and adding a new index is needlessly wasteful when we already have argo_archived_workflows_i4, so I opted to modify that index to cover (clustername, startedat). The new query looks like this:

SELECT
  `name`,
  `namespace`,
  `uid`,
  `phase`,
  `startedat`,
  `finishedat`,
  coalesce(workflow->'$.metadata.labels', '{}') as labels,
  coalesce(workflow->'$.metadata.annotations', '{}') as annotations,
  coalesce(workflow->>'$.status.progress', '') as progress,
  coalesce(workflow->>'$.metadata.creationTimestamp', '') as creationtimestamp,
  workflow->>'$.spec.suspend',
  coalesce(workflow->>'$.status.message', '') as message,
  coalesce(workflow->>'$.status.estimatedDuration', '0') as estimatedduration,
  coalesce(workflow->'$.status.resourcesDuration', '{}') as resourcesduration
FROM `argo_archived_workflows`
WHERE ((`clustername` = ?  AND `namespace` = ? AND `instanceid` = ?))
ORDER BY `startedat` DESC
LIMIT 100

Verification

First, I used https://github.com/argoproj/argo-workflows/pull/13715 to generate 100,000 randomized workflows, with https://gist.github.com/MasonM/52932ff6644c3c0ccea9e847780bfd90 as a template:

Then, I ran make BenchmarkWorkflowArchive once on the main branch and once on this branch (with migration applied), and used benchstat to compare: