argoproj / argo-workflows

Workflow Engine for Kubernetes
https://argo-workflows.readthedocs.io/
Apache License 2.0
14.91k stars 3.18k forks source link

Ability to override pod GC strategy on a per-template basis #13074

Open oliverdain opened 4 months ago

oliverdain commented 4 months ago

Summary

Currently the podGC policy is set globally for the entire workflow. Sometimes it's helpful to be able to override it for a single step.

Use Cases

We have a use case where we have to create a persistent volume using a resource template in one step of a workflow. That volume is big and backed by fast storage so it's expensive. Then we populate it with data. Finally we clone it into a ReadOnlyMany persistent volume so that we can run many long-running ML jobs that all use the same data. We do the clone bit because GCP doesn't let you change a volume from ReadWriteOnce to ReadOnlyMany and the fast storage types don't support ReadWriteMany. So, after we've created our read-only, sharable volume we no longer need the original, writable volume. And, that volume is holding many TB's of data so it's expensive. The ML jobs run for several days so we don't want to pay for the volume if we don't need it. So, I have a workflow step to delete the persistent volume when the read-only volume is ready. But, while that step does immediately change the status of the volume claim to Terminating it's still bound to the pod that populated it so if my global GC policy is something that causes that pod to stick around until the workflow is complete the volume never gets cleaned up. I'd love to be able to override the GC policy for just that one task.


Message from the maintainers:

Love this feature request? Give it a šŸ‘. We prioritise the proposals with the most šŸ‘.

agilgur5 commented 4 months ago

Follow-up to this Slack thread. This sounds like a good use-case to me and in general a use-case of "I want to delete some Pods of a Workflow sooner or later than others" makes sense to me, as some are more important to either keep around or delete than others.

on a per-template or per-step basis

A step is a template, so I've simplified the title

pod GC policy

Also the field is podGC.strategy, not policy, so I updated that too