Open andrewm-aero opened 3 months ago
Yea Maximum call stack size exceeded
is highly likely to be due to an infinite loop
You may or may not be able to workaround this with the alternate visualization algorithm in the UI, usable when you toggle the "lightning bolt" button.
I will be working on producing a minimal example tomorrow.
That would also be helpful to understand what scenario is going on as I didn't quite follow the description. "Overwritten" as in used the same location? Since artifacts otherwise don't overlap
Yea
Maximum call stack size exceeded
is highly likely to be due to an infinite loopYou may or may not be able to workaround this with the alternate visualization algorithm in the UI, usable when you toggle the "lightning bolt" button.
This isn't an option unfortunately, it crashes immediately before even displaying the toolbar. Toggling the icon in another workflow and then returning to the affected workflow page produces the same error.
I will be working on producing a minimal example tomorrow.
That would also be helpful to understand what scenario is going on as I didn't quite follow the description. "Overwritten" as in used the same location? Since artifacts otherwise don't overlap
Yes. Consider the scenario where step A takes an input artifact from a specific key from a specific bucket, then step B has an output artifact stored in the same bucket and at the same key. The actual workflows have several steps in between, but that's the gist of it.
After some work, I was able to get a version of the workflow with no identifying info that reproduces the behavior. One thing I noticed, but didn't expect, was that simply having the cycle did not cause the effect, and if any of the {"archive": {"none": {}}}
's are removed, the crash doesn't happen.
Thanks for providing the repro!
One thing I noticed, but didn't expect, was that simply having the cycle did not cause the effect, and if any of the
{"archive": {"none": {}}}
's are removed, the crash doesn't happen.
Yea I was thinking about this, and the artifacts shouldn't be used during graph iteration. I have a feeling that either the status got corrupted somehow or the deserialization somehow corrupted the object in-memory, causing a circular reference somewhere. The latter code hasn't changed in years, so my gut thought is that it's the status (and there have been some wacky bugs found in the artifact status processing recently like #12845). But TBD
somewhere in https://github.com/argoproj/argo-workflows/blob/5d893b161bbe4833d578be9f6c0322849215c23f/ui/src/shared/components/graph/graph-panel.tsx#L143 https://github.com/argoproj/argo-workflows/blob/5d893b161bbe4833d578be9f6c0322849215c23f/ui/src/shared/components/graph/layout.ts#L8 https://github.com/argoproj/argo-workflows/blob/5d893b161bbe4833d578be9f6c0322849215c23f/ui/src/shared/components/graph/fast-layout.ts#L7 https://github.com/argoproj/argo-workflows/blob/5d893b161bbe4833d578be9f6c0322849215c23f/ui/src/shared/components/graph/coffman-graham-sorter.ts#L15 https://github.com/argoproj/argo-workflows/blob/5d893b161bbe4833d578be9f6c0322849215c23f/ui/src/shared/components/graph/dfs-sorter.ts#L22 https://github.com/argoproj/argo-workflows/blob/5d893b161bbe4833d578be9f6c0322849215c23f/ui/src/shared/components/graph/types.ts#L35
Pre-requisites
:latest
image tag (i.e.quay.io/argoproj/workflow-controller:latest
) and can confirm the issue still exists on:latest
. If not, I have explained why, in detail, in my description below.What happened? What did you expect to happen?
When viewing a workflow in the UI that has an "artifact cycle", that is, an input artifact is later overwritten as an output artifact of a later step/task, the UI crashes with the following error
While this is baseless speculation, I suspect the issue is that whatever code is (correctly) assuming that the node graph is acyclic, is also (incorrectly) assuming that artifacts are acyclic.
For context, this pattern is useful to record a "high watermark", and have the workflow terminate or skip steps if e.g. no new data is available for processing, but update the same object with a new watermark after processing new data. This is particularly useful when combined with a CronWorkflow that periodically checks for and processes new data.
The attached workflow crashes the only ONLY when the "use faster, but less pretty ..." and "toggle artifacts" options are selected, and was tested with the initials contents of the s3 artifact as
{"this": "is a test file"}
.Version(s)
v3.5.8
Paste a minimal workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.
Logs from the workflow controller
Logs from in your workflow's wait container