Closed gcapizzi closed 2 years ago
This is a no-op. The controller owner reference is already set and checked on the Job
.
I've manually verified creating a taskworkload with a long sleep successfully creates a pod, and deleting the taskworkload causes the pod to terminate and disappear.
Example taskworkload:
apiVersion: korifi.cloudfoundry.org/v1alpha1
kind: TaskWorkload
metadata:
name: taskworkload-sample
spec:
image: alpine
command:
- sleep
- "100"
@kieron-dev I think we still might be leaving dangling pods just like Eirini does until we finish #1461 - wondering if the termination happens even when the pod is not deleted? 🤔
The pods are being deleted though. So it looks like garbage collection is happening anyway.
It's confusing! I'm reading the docs and here it says:
When you delete the job using
kubectl
, all the pods it created are deleted too.
but here it says:
It is recommended to set
ttlSecondsAfterFinished
field because unmanaged jobs (Jobs that you created directly, and not indirectly through other workload APIs such as CronJob) have a default deletion policy oforphanDependents
causing Pods created by an unmanaged Job to be left around after that Job is fully deleted. Even though the control plane eventually garbage collects the Pods from a deleted Job after they either fail or complete, sometimes those lingering pods may cause cluster performance degradation or in worst case cause the cluster to go offline due to this degradation.
So yeah, I guess what we're seeing is garbage collection, but wouldn't the problem persist even after setting ttlSecondsAfterFinished
for Jobs that are manually deleted before completion? Maybe we should force a different deletion policy?
Also here:
The control plane cleans up terminated Pods (with a phase of
Succeeded
orFailed
), when the number of Pods exceeds the configured threshold (determined byterminated-pod-gc-threshold
in the kube-controller-manager). This avoids a resource leak as Pods are created and terminated over time.
So I don't understand how we're seeing garbage collection in action, unless our threshold is 0 or very low? 🤔
@kieron-dev I think I got it: it's all about the propagationPolicy
provided in the DeleteOptions
. The Job
resource has a default policy of Orphan
, but the kubectl
defaults to Background
! This might also be the case for k9s
.
I am assuming that the policy is propagating all the way down, so deleting a TaskWorkload
with a Background
policy also leads to the Job
being deleted in the same way. On the other hand, not having set a default propagation policy for TaskWorkload
, deletions are still using Background
but probably not for the Job
, as we've seen cases of dangling pods. I'm going to take a closer look.
Ok, I think I solved the mystery 😅 the key is this sentence:
unmanaged jobs (Jobs that you created directly, and not indirectly through other workload APIs such as CronJob) have a default deletion policy of
orphanDependents
Our Job
s are not unmanaged, as we create them and set the TaskWorkload
as their parent. I think this leads to the TaskWorkload
propagation policy (Background
, the default one) overrides the Job
one and the Pod
gets deleted in the background as we expected.
Background
As an operator I want to replace Korifi's task runner So that I can customise the way my system runs tasks
Acceptance Criteria
GIVEN I have created a
TaskWorkload
AND the command has started running WHEN I delete theTaskWorkload
THEN I see the associatedPod
disappearDev Notes
This should only be a matter of setting owner references properly.