Open muckelba opened 1 month ago
Disruption refers to voluntary disruption modes: e.g. Drift, Expiration, and Consolidation. None of these can take place when the NodePool or NodeClass does not exist, hence why Karpenter can't disrupt the NodeClaim. That doesn't mean Karpenter can't terminate the NodeClaim. Deleting the NodeClass should result in Karpenter setting a deletion timestamp on each NodeClaim associated with that NodeClass, and those NodeClaims will gracefully terminate. Graceful termination isn't bounded; blocking PDBs can prevent a NodeClaim from terminating indefinitely.
If you're able to share Karpenter logs and the NodeClaim resources we should be able to determine if Karpenter is operating correctly. If it is and you want to be able to set an upper bound on termination time, you'll probably be interested in https://github.com/kubernetes-sigs/karpenter/pull/916 which just merged in the upstream repo.
Hey, thank you for your explanation. I just did some more testing, even without any PDBs in the cluster (except for karpenter but that's running on fargate), the nodes wont terminate.
{"level":"ERROR","time":"2024-07-24T11:23:01.154Z","logger":"controller.disruption","message":"listing instance types for common, resolving node class, ec2nodeclasses.karpenter.sh \"default-ec2nodeclass\" is terminating, treating as not found","commit":"6b868db"}
every 10 seconds or so.Type Reason Age From Message
---- ------ ---- ---- -------
Normal WaitingOnNodeClaimTermination 14m (x14 over 175m) karpenter Waiting on NodeClaim termination for common-zvvgb, common-n7mqv
Type Reason Age From Message
---- ------ ---- ---- -------
Normal DisruptionBlocked 3m31s (x88 over 177m) karpenter Cannot disrupt NodeClaim: Owning nodepool "common" not found
Type Reason Age From Message
---- ------ ---- ---- -------
Warning 4m40s (x633 over 21h) karpenter Failed resolving NodeClass
Type Reason Age From Message
---- ------ ---- ---- -------
Normal DisruptionBlocked 70s (x90 over 179m) karpenter Cannot disrupt Node: Owning nodepool "common" not found
That's everything i can find that is relating to the deletion.
How does the release process of karpenter go? There's the merge in kubernetes-sigs/karpenter and then the cloud specific providers (aws in this case) has to implement and release it too?
Description
Observed Behavior: When deleting a NodeClass, karpenter wants to delete the nodeclaims (
Waiting on NodeClaim termination for common-xdbj9, common-vvclr, common-2ppgb
) but they suddenly cant find their nodepool anymore (Cannot disrupt NodeClaim: Owning nodepool "common" not found
). Karpenter just logsresolving node class, ec2nodeclasses.karpenter.sh "default" is terminating, treating as not found
as soon as the deletion gets issued.Expected Behavior: The nodeClaims delete themselfs first and then the nodeClass.
Reproduction Steps (Please include YAML):
kubectl delete ec2nodeclasses.karpenter.k8s.aws default
Versions:
Chart Version:
0.36.0
Kubernetes Version (
kubectl version
):v1.29.4-eks-036c24b
Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment