Open oliverdain opened 6 months ago
Follow-up to this Slack thread. This sounds like a good use-case to me and in general a use-case of "I want to delete some Pods of a Workflow sooner or later than others" makes sense to me, as some are more important to either keep around or delete than others.
on a per-template or per-step basis
A step is a template, so I've simplified the title
pod GC policy
Also the field is podGC.strategy
, not policy, so I updated that too
Summary
Currently the
podGC
policy is set globally for the entire workflow. Sometimes it's helpful to be able to override it for a single step.Use Cases
We have a use case where we have to create a persistent volume using a resource template in one step of a workflow. That volume is big and backed by fast storage so it's expensive. Then we populate it with data. Finally we clone it into a
ReadOnlyMany
persistent volume so that we can run many long-running ML jobs that all use the same data. We do the clone bit because GCP doesn't let you change a volume fromReadWriteOnce
toReadOnlyMany
and the fast storage types don't supportReadWriteMany
. So, after we've created our read-only, sharable volume we no longer need the original, writable volume. And, that volume is holding many TB's of data so it's expensive. The ML jobs run for several days so we don't want to pay for the volume if we don't need it. So, I have a workflow step to delete the persistent volume when the read-only volume is ready. But, while that step does immediately change the status of the volume claim toTerminating
it's still bound to the pod that populated it so if my global GC policy is something that causes that pod to stick around until the workflow is complete the volume never gets cleaned up. I'd love to be able to override the GC policy for just that one task.Message from the maintainers:
Love this feature request? Give it a š. We prioritise the proposals with the most š.