gianlucam76 / k8s-cleaner

Cleaner is a Kubernetes controller that identifies unused or unhealthy resources, helping you maintain a streamlined and efficient Kubernetes cluster. It provides flexible scheduling, label filtering, Lua-based selection criteria, resource removal or update and notifications via Slack, Webex and Discord. it can also automate clusters operations.
https://projectsveltos.github.io/sveltos/
Apache License 2.0
319 stars 20 forks source link

Cleaner instance for Terminating Pods #127

Closed aminmr closed 2 months ago

aminmr commented 2 months ago

Hi, I’m trying to implement a cleaner for terminating pods in my Kubernetes cluster using this project. However, I’ve tried several approaches, and none of them seem to work effectively.

What I’ve Tried:

apiVersion: apps.projectsveltos.io/v1alpha1
kind: Cleaner
metadata:
  name: completed-pods
spec:
  schedule: "*/5 * * * *"
  resourcePolicySet:
    resourceSelectors:
    - kind: Pod
      group: ""
      version: v1
      evaluate: |
        function evaluate()
          hs = {}
          hs.matching = false

          -- Check if the pod has a deletionTimestamp field (i.e., pod is terminating)
          if obj.metadata.deletionTimestamp ~= nill then
            -- If deletionTimestamp has a value, the pod is terminating
            hs.matching = true
          end

          return hs
        end
  action: Scan

It seems like I’m missing something or not handling pod termination conditions correctly.

My Objective:

I want to remove pods that are stuck in the terminating state for a certain period of time (e.g., more than 5 minutes).

Could you provide any advice or sample code to help me write this cleaner for terminating pods? Any help or guidance would be greatly appreciated!

gianlucam76 commented 2 months ago

Thank you @aminmr. I will give it a try in few hours and get back. thanks

gianlucam76 commented 2 months ago

Hi @aminmr there was nothing wrong with your Cleaner instance. Simply cleaner by default excluded resources marked as deleted.

I added a new field to CRD.

    // ExcludeDeleted if set (default value), exclude resources marked as
    // deleted. If set to false, k8s-cleaner will consider also resources marked as deleted.
    // +kubebuilder:default:=true
    ExcludeDeleted bool `json:"excludeDeleted,omitempty"`

by default this field is true, so k8s-cleaner will not consider by default deleted resources (so change is backward compatible).

In your case simply set it to false like this

I just released version v0.8.0 for that (you need to upgrade CRDs along with image)

gianlucam76 commented 2 months ago

Added support for this in v0.8.0

aminmr commented 2 months ago

Thanks a lot, @gianlucam76 I checked, and it has been fixed. For terminating pods, in some cases, we need to force delete the resource. Is there any feature for force delete?

gianlucam76 commented 2 months ago

Hi @aminmr maybe you can try instructing k8s-cleaner to modify the Pod by setting TerminationGracePeriodSeconds. But I am not sure that will force kill.

If there is any field you can use, instruct k8s-cleaner setting Action to Transform and then use

    // Transform contains a function "transform" in lua language.
    // When Action is set to *Transform*, this function will be invoked
    // and be passed one of the object selected based on
    // above criteria.
    // Must the new object that will be applied
    // +optional
    Transform string `json:"transform,omitempty"`
aminmr commented 2 months ago

Unfortunately, It's not working. Also, the deletionGracePeriodSeconds is immutable and can't transform the manifest. Do you suggest any solution for the ForceDelete feature? Because I think It's only usable for Pod resources, New action is not reasonable. I'm volunteering to contribute @gianlucam76

gianlucam76 commented 2 months ago

Hi @aminmr can you look into how kubectl handles the --force?

When I want to force delete resources, I usually remove the finalizer. But I am not sure what kubectl does for pods

aminmr commented 2 months ago

I read the response of -v9. It needs to add gracePeriodSecond=0 to the delete API body request.

Request Body: {"gracePeriodSeconds":0,"propagationPolicy":"Background"}

If we want to add this switch in the Delete action, we need to set this in the request body. But it only works for pod resources; other resources don't have a force-delete concept.

My suggestion is to add forceDelete variable, set the default to false, and, for force delete usage, set the variable to true in the Cleaner manifest. I don't know if it's the best practice. Thank you for answering me. @gianlucam76

gianlucam76 commented 2 months ago

Thanks @aminmr

I see that the controller runtime DeleteOption has

type DeleteOptions struct {
    // GracePeriodSeconds is the duration in seconds before the object should be
    // deleted. Value must be non-negative integer. The value zero indicates
    // delete immediately. If this value is nil, the default grace period for the
    // specified type will be used.
    GracePeriodSeconds *int64

Since this is generically exposed as delete option, I can expose that as well. Can you please file an enhancement request? I will implement it tomorrow and then ask you to please verify it. Thank you

gianlucam76 commented 2 months ago

@aminmr can you check if this works?

You can fork repo. then checkout delete-options branch and run: "make create-cluster" which will create a kind cluster locally with k8s-cleaner.

aminmr commented 2 months ago

Hello again @gianlucam76

Thank you very much for your PR related to my problem. I tested your branch with a kind cluster, and it works fine and deleted all terminating pods. I also created a PR for the examples related to this feature.

Thank you very much for your time.

gianlucam76 commented 2 months ago

@aminmr thank you so much for testing this. I will merge both prs (this and yours) tomorrow and i will release v0.9.0