aws / karpenter-provider-aws

Karpenter is a Kubernetes Node Autoscaler built for flexibility, performance, and simplicity.
https://karpenter.sh
Apache License 2.0
6.81k stars 957 forks source link

Karpenter 0.37 Upgrade: Generic Ephemeral Volumes Not Deleting After Pod Removal Without Enabling Webhook #6997

Open apjneeraj opened 1 month ago

apjneeraj commented 1 month ago

Description

Observed Behavior:

We use Generic Ephemeral Volume in one of our use case and the lifecycle of these volumes follows the lifecycle of the pod. Until Karpenter 0.36.x, those volumes(PVCs/PVs) getting deleted automatically as soon as pod is deleted.

We upgraded to Karpenter 0.37.2, that has a webhook which is disabled by default. That breaks some of the functionality due to both v1 and v1beta1 APIs and we were unable to directly use kubectl get nodepool|nodeclaims|ec2nodeclasses without api suffix. But that did not break any server side functionality until we noticed that we have hundreds of EBS volumes in available state, meaning Pods using those volumes are already gone but underlying PVCs and volumes still lying around. Earlier, that was not the case.

Further investigation showed, only recent change in cluster was Karpenter and a spike in PVCs in Grafana dashboard post upgrading the Karpenter.

Due to other CRD and webhook issues in Karpenter chart related to https://github.com/aws/karpenter-provider-aws/issues/6847 and https://github.com/aws/karpenter-provider-aws/issues/6867, there is no direct way we could use Flux to change default namespace hardcoded in CRDs in main chart.

Workaround: We had to manually update the CRDs in cluster and then enabled the webhook which is enabled by default in recent 0.37.3 chart version. After that we stopped observing the issue in our cluster and PVCs remained at steady state.

Expected Behavior: Upgrading to chart 0.37.2 without enabling the webhook should still work and PVCs created thru generic ephemeral should be cleaned up automatically as expected and as it was working with 0.36.x version.

Question: How enabling webhook or in general, Karpenter is involved in deleting those PVCs or PVs. My understanding is Karpenter works with scheduler and is not involved in direct creation or deletion of PVCs , is there anything Karpenter started doing thru webhook which blocks PVCs deletion.

Reproduction Steps (Please include YAML):

  1. Upgrade to Karpenter chart 0.37.2 and don't enable webhook.
  2. Create a sample pod using manifest from https://kubernetes.io/docs/concepts/storage/ephemeral-volumes/#generic-ephemeral-volumes.
  3. A new pod my-app and a PVC my-app-scratch-volume will be created.
  4. Run kubectl get pvc my-app-scratch-volume -oyaml to see the ownerReference, it would be something like below:
    ownerReferences:
    - apiVersion: v1
    blockOwnerDeletion: true
    controller: true
    kind: Pod
    name: my-app
  5. Just delete the pod , kubectl delete pod my-app
  6. Observe PVC created in step 2 will still be available and not cleaned up after pod deletion. kubectl get pvc
  7. Check AWS EC2 Console for EBS volume created for above PVC. The volume id can be fetched from below steps. Volume will be in available state and free to be deleted.
1. kubectl describe pvc  my-app-scratch-volume | grep -i volume:
2. kubectl describe pv <pv name from above step> | grep -i VolumeHandle:
  1. Only if we delete pvc manually, the underlying EBS volume gets deleted.
  2. If we enable webhook and update CRDs to override the default namespace for Karpenter, everything comes back to normal.

Versions:

apjneeraj commented 1 month ago

Is there anything else I can provide or any more information needed to get some insights here? Thanks