Closed fullykubed closed 1 day ago
There is a reason why we are propagating these labels as requirements on nodeClaim. These labels are propagated to nodes which are instrumental in node selection by using nodeAffinity
or nodeSelectors
on your Pods.
These labels are propagated to nodes which are instrumental in node selection by using nodeAffinity or nodeSelectors on your Pods.
This part I understand and my intent is to provide an arbitrary number of labels to generated nodes.
There is a reason why we are propagating these labels as requirements on nodeClaim.
This part I do not understand. I am sure there is a reason, but it is unclear why every node label must be an individual requirement on every NodeClaim
. Additionally, this doesn't appear to be a strict rule as I can see Karpenter add many node labels that are not treated as requirements.
Maybe my mental model is incorrect, but wouldn't it make sense to only add the label as a requirement if it is used in a label selector by a pending pod?
When karpenter tries to schedule your workload, it has to take into consideration all the requirements from nodePool (labels as well as requirements) as well as pod requirements. If a pod has a node affinity for a specific label then Karpenter needs to know that it is a requirement for the launch and it orchestrates that by adding it to nodeClaim requirements.
If a pod has a node affinity for a specific label then Karpenter needs to know that it is a requirement for the launch and it orchestrates that by adding it to nodeClaim requirements.
I think you are highlighting the point I am trying to make: Even though no pods are using these labels for node selection, all labels from the NodePool's spec.template.metadata.labels
are still being added as requirements.
My expectation would be that only those labels that are being used by pods for node selection would be added as requirements.
In particular, I do not believe this statement is correct:
it has to take into consideration all the requirements from nodePool (labels as well as requirements)
At the very least this does not seem to be the documented behavior for NodePools.
Per the documentation for spec.template.spec.requirements
:
# Requirements that constrain the parameters of provisioned nodes.
# These requirements are combined with pod.spec.topologySpreadConstraints, pod.spec.affinity.nodeAffinity, pod.spec.affinity.podAffinity, and pod.spec.nodeSelector rules.
There is no mention of spec.template.metadata.labels
being implicitly added as additional requirements.
Also this conceptually doesn't seem to make sense. If someone wanted a node label to always be used as a requirement, that is the purpose of the spec.template.spec.requirements
field.
Adding all of the labels implicitly even when they are never used by pods for scheduling seems like a bug and adds an unnecessary limitation to Karpenter (a hard cap on the number of labels you can apply to nodes).
While I understand the point you are trying to make here, if we expect the node
to have a label then we consider it as a requirement on nodeClaim.
However, for your use case we can go ahead and increase the limit to the number of labels that work for you. Does increasing the limit for spec.requirements
to 100 work for you?
100 would be more than enough for my use case personally. I appreciate your flexibility in helping to find a workaround.
I also think that if adding all node labels as requirements is the intended behavior and isn't going to change, that should probably be added to the documentation along with a warning that there is a hard cap of 100 total requirements and labels.
Closing since we've merged the change to allow 100 requirements. This change should be present in the next release. We're going to do a docs update in a separate change that won't be tracked by this original issue.
Description
Observed Behavior:
Labels added to
spec.template.metadata
ofNodePool
resources are added to the requirements array of generatedNodeClaim
resources.If the number of labels exceeds 30, the following errors are generated:
In practice, the error occurs with much fewer than 30 as you would likely have other requirements.
Expected Behavior:
I imagine there is some intended purpose for propagating labels as requirements, but I am not sure what it might be.
I think that this should be opt-in behavior rather than the default (or it might be an outright bug). I do not know what benefit this would provide in the majority of circumstances, but it hinders the ability to apply labels that should not impact scheduling nodes (for example, for ownership and billing purposes).
At the very least, I think this behavior should be documented.
Reproduction Steps (Please include YAML):
This NodePool
would generated this
NodeClaim
Versions:
Chart Version:
0.37.0
Kubernetes Version (
kubectl version
):v1.29.4-eks-036c24b
Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment