cloudfoundry / korifi

Cloud Foundry on Kubernetes
Apache License 2.0
317 stars 65 forks source link

[Feature]: `kubectl` user can kill a `TaskWorkload` execution by deleting its resource #1406

Closed gcapizzi closed 2 years ago

gcapizzi commented 2 years ago

Background

As an operator I want to replace Korifi's task runner So that I can customise the way my system runs tasks

Acceptance Criteria

GIVEN I have created a TaskWorkload AND the command has started running WHEN I delete the TaskWorkload THEN I see the associated Pod disappear

Dev Notes

This should only be a matter of setting owner references properly.

kieron-dev commented 2 years ago

This is a no-op. The controller owner reference is already set and checked on the Job.

I've manually verified creating a taskworkload with a long sleep successfully creates a pod, and deleting the taskworkload causes the pod to terminate and disappear.

Example taskworkload:

apiVersion: korifi.cloudfoundry.org/v1alpha1
kind: TaskWorkload
metadata:
  name: taskworkload-sample
spec:
  image: alpine
  command:
    - sleep
    - "100"
gcapizzi commented 2 years ago

@kieron-dev I think we still might be leaving dangling pods just like Eirini does until we finish #1461 - wondering if the termination happens even when the pod is not deleted? 🤔

kieron-dev commented 2 years ago

The pods are being deleted though. So it looks like garbage collection is happening anyway.

gcapizzi commented 2 years ago

It's confusing! I'm reading the docs and here it says:

When you delete the job using kubectl, all the pods it created are deleted too.

but here it says:

It is recommended to set ttlSecondsAfterFinished field because unmanaged jobs (Jobs that you created directly, and not indirectly through other workload APIs such as CronJob) have a default deletion policy of orphanDependents causing Pods created by an unmanaged Job to be left around after that Job is fully deleted. Even though the control plane eventually garbage collects the Pods from a deleted Job after they either fail or complete, sometimes those lingering pods may cause cluster performance degradation or in worst case cause the cluster to go offline due to this degradation.

So yeah, I guess what we're seeing is garbage collection, but wouldn't the problem persist even after setting ttlSecondsAfterFinished for Jobs that are manually deleted before completion? Maybe we should force a different deletion policy?

gcapizzi commented 2 years ago

Also here:

The control plane cleans up terminated Pods (with a phase of Succeeded or Failed), when the number of Pods exceeds the configured threshold (determined by terminated-pod-gc-threshold in the kube-controller-manager). This avoids a resource leak as Pods are created and terminated over time.

So I don't understand how we're seeing garbage collection in action, unless our threshold is 0 or very low? 🤔

gcapizzi commented 2 years ago

@kieron-dev I think I got it: it's all about the propagationPolicy provided in the DeleteOptions. The Job resource has a default policy of Orphan, but the kubectl defaults to Background! This might also be the case for k9s.

I am assuming that the policy is propagating all the way down, so deleting a TaskWorkload with a Background policy also leads to the Job being deleted in the same way. On the other hand, not having set a default propagation policy for TaskWorkload, deletions are still using Background but probably not for the Job, as we've seen cases of dangling pods. I'm going to take a closer look.

gcapizzi commented 2 years ago

Ok, I think I solved the mystery 😅 the key is this sentence:

unmanaged jobs (Jobs that you created directly, and not indirectly through other workload APIs such as CronJob) have a default deletion policy of orphanDependents

Our Jobs are not unmanaged, as we create them and set the TaskWorkload as their parent. I think this leads to the TaskWorkload propagation policy (Background, the default one) overrides the Job one and the Pod gets deleted in the background as we expected.