Open hryamzik opened 3 years ago
Thanks for asking @hryamzik .
I cannot find that code in eksctl
anymore, so it looks like it was removed. I will look into why, and if it happened by accident I will put it back 👍
same issue also exists with
eksctl version: 0.56.0
kubectl version: v1.21.2
OS: darwin
probably has something to do with https://github.com/weaveworks/eksctl/blob/69c1a78f12017f47bcb247f388e9f9442f7da11c/pkg/apis/eksctl.io/v1alpha5/validation.go#L477
eventually drilling down allowing https://github.com/kubernetes/kubernetes/blob/ea0764452222146c47ec826977f49d7001b0ea8c/staging/src/k8s.io/api/core/v1/well_known_labels.go#L51
not sure how this now allows setting the role.
I was not able to dig into this further today, and I am away from now until the week after next. Free for anyone in the team or community to pick up 👍
While eksctl allows node-role.kubernetes.io
but the CF EKS module AWS::EKS::Nodegroup/ManagedNodeGroup
rejected it. @TBBle has well explained the issue here.
Indeed, this is the same core issue as #2363, but using Managed Node Groups, so the error is different.
Hence, the solution is blocked on some development work elsewhere; Managed Node Groups might be solved via the same thing that solves #2363, e.g., https://github.com/kubernetes/cloud-provider-aws/issues/110, but it's also possible that AWS could implement a Managed Node Groups-specific solution which would resolve this problem, but not help #2363.
Interestingly, there's no feature request ticket open on https://github.com/aws/containers-roadmap/ for this. The ticket I linked to before was actually for setting the node-role label to the node group name, rather than the label specified in the config, which is a more-specific use-case than what we have here.
Edit: I opened a feature request upstream for EKS Managed Node Groups to support this: https://github.com/aws/containers-roadmap/issues/1451
probably has something to do with https://github.com/weaveworks/eksctl/blob/69c1a78f12017f47bcb247f388e9f9442f7da11c/pkg/apis/eksctl.io/v1alpha5/validation.go#L477
eventually drilling down allowing https://github.com/kubernetes/kubernetes/blob/ea0764452222146c47ec826977f49d7001b0ea8c/staging/src/k8s.io/api/core/v1/well_known_labels.go#L51
not sure how this now allows setting the role.
As @gohmc and @TBBle have pointed out, the error is from the Managed Nodegroups API itself and not from a validation performed in eksctl. eksctl is blocked on the upstream API adding support for reserved prefixes in labels.
I'd say this isn't a dupe, as that ticket was looking at unmanaged node-groups (i.e. closer to #2363), and its proposed solution approach solves a different problem: For both this ticket and #2363, the use-case is setting the node-role, which I think would not be considered "solved" if it only applies to nodes that were created while using eksctl to create the nodegroup, as future-created nodes, e.g. from the Cluster Autoscaler, would have an empty role still.
This issue in blocked state until AWS provides us with a reasonable solution to keep labels over autoscaler. Maybe through Karpenter.
I expect that Karpenter, being a slot-in replacement for Cluster Autoscaler, will sadly have the same timing problem as a 'node label controller'.
It's not entirely slot-in. It has some extra features and a different algorithm. Are you sure it couldn't have the ability to reapply existing labels to nodes?
It could, but it'd be doing it at the same time as any other Controller (except an Admission Controller), i.e. after the Node object has been persisted, and racing with the kube-scheduler to apply labels before any other Controller tries to make decisions about them.
So it can do it, but so could Cluster Autoscaler, and so could a dedicated Controller. They all have the same chance of success, AFAIK. (Unless somehow Karpenter is creating the Node resource and then kubelet only updates it, but I didn't see any suggestion of that in its docs, and I'm not sure if k8s is robust against that anyway...)
The core issue is that in the Node admission workflow, the Node object is created by the kubelet, and after passing the Node Restriction Admission Controller, persisted and available to the scheduler. It's possible the Cloud Provider Manager can get involved and add labels to the Node before it's marked 'Ready', I'm not sure. I didn't see a place for it last time I looked, but opened https://github.com/kubernetes/cloud-provider-aws/issues/110 in case someone else could see what I couldn't.
Annoyingly, even a mutating Admission Controller (my other thought) might not work, because the Node Restriction Admission Controller would run after it, and might reject the labels even though kubelet itself didn't add them; I'm not sure if it can distinguish labels added by kubelet in its request from labels added to that request by a mutating Admission Controller.
@TBBle thanks for the explanation!
Reading about kubernetes things handling nodes and AWS doing its thing, I have no idea how this could ever be resolved other than a continuously checking/updating controller, which we won't do sadly. :/
If all you're trying to do is apply the node-role
label as something other than master
, and you're using it only for informative output in kubectl get Nodes
(and not nodeSelector
, or metrics tagging for example), then a controller that reacts to new nodes and applies that label would be fine, because then it doesn't materially matter if someone sees the node in the kubectl
output for the few seconds before the label hasn't been applied yet.
Otherwise, it's best to use your own label for such behaviour, which is what I've done in the past: used a mycompany.com/node-role
label (and taint) to group nodes from multiple node groups under one of a few categories, e.g., backend, publicIP (for publicly reachable NodePort or HostPort Pods) and locust (to isolate the load-testing Pods from the load being tested).
I honestly think if they taught kubectl to also recognise a different label for the Node Role field, and rejected attempts to set that value to master
, it'd be sufficiently secure for the use-case, and would resolve all this friction in the node-management community.
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.
This issue was closed because it has been stalled for 5 days with no activity.
Check out a simple workaround here: https://hub.docker.com/repository/docker/enver/label-nodes/general
I'm trying to apply node labels using the following configuration:
According to #580 and #582 using
node-role
subdomain should the way to apply labels, however I still get a domain error:Here's
eksctl
info: