Open schahal opened 3 days ago
A bit more clarity:
Prior to v1.0.4
, Karpenter was able to take spec.kubelet.clusterDNS
of the EC2NodeClass and overlay the kubelet config of that value.
With v1.0.4+, it's keeping the default value we pass into the node user-data:
[settings]
...
[settings.kubernetes]
...
cluster-dns-ip = '172.20.0.10'
max-pods = 29
...
So it's not just clusterDNS (e.g., maxPods is kept at 29 also).
Looking at v1.0.4 release notes, there was a maxPods-related commit (https://github.com/aws/karpenter-provider-aws/pull/7020), but does that vibe with the symptoms of this issue?
Do you have compatibility.karpenter.sh/v1beta1-kubelet-conversion
annotation in any of your nodepools? The compatibility.karpenter.sh/v1beta1-kubelet-conversion
NodePool annotation takes precedence over the EC2NodeClass Kubelet configuration when launching nodes
Indeed, looking at the nodepools, they have annotations like this (though not explicitly added (edit: by us), so probably by Karpenter itself when we migrated to v1.0):
compatibility.karpenter.sh/v1beta1-kubelet-conversion: {"kubeReserved":{"cpu":"90m","ephemeral-storage":"1Gi","memory":"1465Mi"}}
compatibility.karpenter.sh/v1beta1-nodeclass-reference: {"name":"default-ebs"}
Couple questions:
v1.1.x
(here we're going to v1.0.4
)v1.0.3
to v1.0.4
(and not, for example, when we jumped from v1.0.2
to v1.0.3
?Is the suggestion here that we remove those annotations and retry the upgrade?
- The values of those annotations seem like unrelated configs (?)
- it also seems like the docs suggest we remove them during an eventual jump to v1.1.x (here we're going to v1.0.4)
The nodepool
compatibility.karpenter.sh/v1beta1-kubelet-conversion will take precedence over the over the EC2NodeClass Kubelet configuration. At v1.1 the expectation is that the annotation is removed by customers.
The compatibility.karpenter.sh/v1beta1-kubelet-conversion NodePool annotation takes precedence over the EC2NodeClass Kubelet configuration when launching nodes. Remove the kubelet-configuration annotation (compatibility.karpenter.sh/v1beta1-kubelet-conversion) from your NodePools once you have migrated kubelet from the NodePool to the EC2NodeClass.
ref: https://karpenter.sh/docs/upgrading/v1-migration/#before-upgrading-to-v11
Wondering why it's only breaking when we jump from v1.0.3 to v1.0.4 (and not, for example, when we jumped from v1.0.2 to v1.0.3?
We had a bug that any new nodeclaim that was launched used the EC2NodeClass kubelet configuration, without considering the kubelet compatibility annotation. This fix was merged in at 1.0.2: https://github.com/kubernetes-sigs/karpenter/pull/1667. Are you able to share node that were created on 1.0.3 with their nodepools and ec2nodeclass kubelet configurations?
Are you able to share node that were created on 1.0.3 with their nodepools and ec2nodeclass kubelet configurations?
Yes, here's a slightly anonymized share of that:
To re-iterate the behavior for complete understanding:
We had a bug that any new nodeclaim that was launched used the EC2NodeClass kubelet configuration, without considering the kubelet compatibility annotation. This fix was merged in at 1.0.2: https://github.com/kubernetes-sigs/karpenter/pull/1667
I believe this is the issue (and why we only see this issue when we upgrade to karpenter-provider-aws:v1.0.4
)
Looking at all commit differences from karpenter-provider-aws v1.0.3 and v1.0.4, it looks like the fix that was merged in https://github.com/kubernetes-sigs/karpenter/pull/1667 was only pulled finally into karpenter-provider-aws:v1.0.4
:
If I'm reading that correctly, what does that mean? Do karpenter users who upgrade from karpenter-provider-aws:v1.0.3
to karpenter-provider-aws:v1.0.4
need to manually remove that kubelet compatibility annotation from all their NodePools (regardless of what value that annotation has)?
Description
Observed Behavior:
After upgrading from
karpenter-provider-aws:v1.0.3
tov1.0.4
, the new Kubernetes nodes that Karpenter provisions does not have the EC2NodeClass'sspec.kubelet.clusterDNS
value in its/etc/kubernetes/kubelet/config
.For example, one of our EC2NodeClass's (e.g.,
default-ebs
) looks like this:Click to view EC2NodeClass
```yaml apiVersion: karpenter.k8s.aws/v1 kind: EC2NodeClass metadata: annotations: karpenter.k8s.aws/ec2nodeclass-hash: "Notice,
spec.kubelet.clusterDNS: [169.254.20.11]
However, on the new node, we see:
Also confirmed the NodePool the node is part of refers to the same EC2NodeClass as above
``` Node Class Ref: Group: karpenter.k8s.aws Kind: EC2NodeClass Name: default-ebs ```Expected Behavior:
After reverting back to
v1.0.3
, we see the correct (expected) value on a new node:Reproduction Steps (Please include YAML):
See above:
karpenter-provider-aws:v1.0.3
spec.kubelet.clusterDNS
/etc/kubernetes/kubelet/config
on node has that same valuekarpenter-provider-aws:v1.0.4
Versions:
ChartImage Version:v1.0.4
Kubernetes Version (
kubectl version
):Server Version: v1.30.4-eks-a737599
Node version:
bottlerocket v1.24.1
Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment