Open shiraOvadia opened 1 year ago
Is there any work around for this? I've set a large value for ttlStrategy.secondsAfterFailure on the workflow to keep failed pods around for longer, but this causes the daemon pods to run for that length of time as well, which is becoming quite costly
For anyone else struggling with this, I've realised that I hadn't set a podGC.strategy
for my root workflow template which referenced the others. Now i've set this to OnWorkflowCompletion
and the daemon pods are being removed as expected.
I look into the logs of the workflow-controller and find some clues: the template name is not correctly set for the
terminateContainers
action.time="2024-01-17T09:32:09.931Z" level=info msg="cleaning up pod" action=terminateContainers key=default/daemon-nginx-s5ggl--452970571/terminateContainers
I then dig into the code and locate the function for this. It seems like the
podName
"default/daemon-nginx-s5ggl--452970571" generated byutil.GeneratePodName
missed thetemplateName
part, which in this case is nilI'm not sure if this bug is related to the
initializeNode
in which theTemplateName
is set from theorgTmpl
. So for a template referencing a workflowTemplate, the value is nil.I wrote a small workflow that can reproduces the bug. All work fine if I use directly a template instead of referencing a workflowTemplate.
Any thoughts on how to fix this? Maybe resolve the template to get the actual podName at runtime like here?
is the workaround still working with version 3.5.6 as I can confirm that we are still affected even after trying the workaround.
Pre-requisites
:latest
What happened/what you expected to happen?
The workflow basic structure: I run a workflow which is consists of 2 steps: one is a (redis) server pod which is configured as a daemon pod, and the second one is a consumer pod which gets the ip address of the daemon one and makes something with it. what I expect to happen: when the workflow is ended, the daemon pod stops as well. what actually happens: if I call the daemon pod via a "regular" template then the daemon pod ends as expected when the workflow ends. But, if I call the daemon pod via templateRef, then the daemon pod does not stop when the workflow ends and it keeps running...
Version
3.4.8, 3.5.0.rc1
Paste a small workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.
Logs from the workflow controller
Logs from in your workflow's wait container