We use Generic Ephemeral Volume in one of our use case and the lifecycle of these volumes follows the lifecycle of the pod. Until Karpenter 0.36.x, those volumes(PVCs/PVs) getting deleted automatically as soon as pod is deleted.
We upgraded to Karpenter 0.37.2, that has a webhook which is disabled by default. That breaks some of the functionality due to both v1 and v1beta1 APIs and we were unable to directly use kubectl get nodepool|nodeclaims|ec2nodeclasses without api suffix. But that did not break any server side functionality until we noticed that we have hundreds of EBS volumes in available state, meaning Pods using those volumes are already gone but underlying PVCs and volumes still lying around. Earlier, that was not the case.
Further investigation showed, only recent change in cluster was Karpenter and a spike in PVCs in Grafana dashboard post upgrading the Karpenter.
Workaround: We had to manually update the CRDs in cluster and then enabled the webhook which is enabled by default in recent 0.37.3 chart version. After that we stopped observing the issue in our cluster and PVCs remained at steady state.
Expected Behavior:
Upgrading to chart 0.37.2 without enabling the webhook should still work and PVCs created thru generic ephemeral should be cleaned up automatically as expected and as it was working with 0.36.x version.
Question: How enabling webhook or in general, Karpenter is involved in deleting those PVCs or PVs. My understanding is Karpenter works with scheduler and is not involved in direct creation or deletion of PVCs , is there anything Karpenter started doing thru webhook which blocks PVCs deletion.
Reproduction Steps (Please include YAML):
Upgrade to Karpenter chart 0.37.2 and don't enable webhook.
Observe PVC created in step 2 will still be available and not cleaned up after pod deletion. kubectl get pvc
Check AWS EC2 Console for EBS volume created for above PVC. The volume id can be fetched from below steps. Volume will be in available state and free to be deleted.
Description
Observed Behavior:
We use Generic Ephemeral Volume in one of our use case and the lifecycle of these volumes follows the lifecycle of the pod. Until Karpenter 0.36.x, those volumes(PVCs/PVs) getting deleted automatically as soon as pod is deleted.
We upgraded to Karpenter 0.37.2, that has a webhook which is disabled by default. That breaks some of the functionality due to both v1 and v1beta1 APIs and we were unable to directly use
kubectl get nodepool|nodeclaims|ec2nodeclasses
without api suffix. But that did not break any server side functionality until we noticed that we have hundreds of EBS volumes in available state, meaning Pods using those volumes are already gone but underlying PVCs and volumes still lying around. Earlier, that was not the case.Further investigation showed, only recent change in cluster was Karpenter and a spike in PVCs in Grafana dashboard post upgrading the Karpenter.
Due to other CRD and webhook issues in Karpenter chart related to https://github.com/aws/karpenter-provider-aws/issues/6847 and https://github.com/aws/karpenter-provider-aws/issues/6867, there is no direct way we could use Flux to change default namespace hardcoded in CRDs in main chart.
Workaround: We had to manually update the CRDs in cluster and then enabled the webhook which is enabled by default in recent 0.37.3 chart version. After that we stopped observing the issue in our cluster and PVCs remained at steady state.
Expected Behavior: Upgrading to chart 0.37.2 without enabling the webhook should still work and PVCs created thru generic ephemeral should be cleaned up automatically as expected and as it was working with 0.36.x version.
Question: How enabling webhook or in general, Karpenter is involved in deleting those PVCs or PVs. My understanding is Karpenter works with scheduler and is not involved in direct creation or deletion of PVCs , is there anything Karpenter started doing thru webhook which blocks PVCs deletion.
Reproduction Steps (Please include YAML):
my-app
and a PVCmy-app-scratch-volume
will be created.kubectl get pvc my-app-scratch-volume -oyaml
to see the ownerReference, it would be something like below:kubectl delete pod my-app
kubectl get pvc
available
state and free to be deleted.Versions:
Chart Version: 0.37.2
Kubernetes Version (
kubectl version
): 1.29.0Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment