exhausted IP addresses from unbalanced zone distribution

universam1 commented 1 week ago

Description

Observed Behavior: IP range of one subnet is exhausted causing "dead" nodes while other zones are left empty.

This is a followup of #1810 #1292 as that topology-spread solution does not scale on large clusters, accross independent teams, namespaces and such.

On dozens of deployments we cannot instruct every developer to care for https://karpenter.sh/v0.10.0/tasks/scheduling/#topology-spread to match exactly accross all teams.

ClusterAutoscaler has this option for a reason https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#im-running-cluster-with-nodes-in-multiple-zones-for-ha-purposes-is-that-supported-by-cluster-autoscaler

Expected Behavior:

Karpenter will take balancing as a requirement for node scheduling

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

engedaam commented 1 week ago

Can you provide your Karpenter configuration? Karpenter should launch nodes into the IPs with the most available IP expect for affinity and topology-spread. Do you have any spread or affinity on your workloads currently?

universam1 commented 3 days ago

Can you provide your Karpenter configuration?

Sure @engedaam , please find the config below

configuration

```yaml apiVersion: karpenter.sh/v1beta1 kind: NodePool metadata: annotations: karpenter.sh/nodepool-hash: "5078040335181941408" karpenter.sh/nodepool-hash-version: v2 name: al2023 spec: disruption: budgets: - nodes: 20% - duration: 55m nodes: "0" schedule: '@hourly' consolidationPolicy: WhenUnderutilized expireAfter: 168h limits: cpu: "125" memory: 1000Gi template: spec: nodeClassRef: name: al2023 requirements: - key: karpenter.sh/capacity-type operator: In values: - spot - on-demand - key: karpenter.k8s.aws/instance-category operator: In values: - c - m - r - t - key: karpenter.k8s.aws/instance-cpu operator: Gt values: - "3" - key: karpenter.k8s.aws/instance-cpu operator: Lt values: - "33" - key: karpenter.k8s.aws/instance-memory operator: Gt values: - "4000" - key: karpenter.k8s.aws/instance-memory operator: Lt values: - "66000" - key: karpenter.k8s.aws/instance-ebs-bandwidth operator: Gt values: - "2000" - key: karpenter.k8s.aws/instance-hypervisor operator: In values: - nitro - key: kubernetes.io/arch operator: In values: - amd64 - key: kubernetes.io/os operator: In values: - linux - key: karpenter.k8s.aws/instance-generation operator: Gt values: - "3" startupTaints: - effect: NoExecute key: node.cilium.io/agent-not-ready value: "true" weight: 90 ``` ```yaml apiVersion: karpenter.k8s.aws/v1beta1 kind: EC2NodeClass metadata: annotations: karpenter.k8s.aws/ec2nodeclass-hash: "11350300940085964065" karpenter.k8s.aws/ec2nodeclass-hash-version: v2 finalizers: - karpenter.k8s.aws/termination name: al2023 spec: amiFamily: AL2023 blockDeviceMappings: - deviceName: /dev/xvda ebs: deleteOnTermination: true encrypted: true throughput: 125 volumeSize: 200Gi volumeType: gp3 instanceProfile: o11n-eks-xxx metadataOptions: httpEndpoint: enabled httpProtocolIPv6: disabled httpPutResponseHopLimit: 2 httpTokens: required securityGroupSelectorTerms: - id: sg-022e0610xxx subnetSelectorTerms: - id: subnet-0e0d9c1xx - id: subnet-0fff56bxx - id: subnet-0884d45xx tags: Name: kubernetes.io/cluster/o11n-eks-o11n-union System: o11n-eks-o11n-union jw:owner: eks jw:project: o11n/eks jw:stage: union userData: | MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="//" --// Content-Type: application/node.eks.aws apiVersion: node.eks.aws/v1alpha1 kind: NodeConfig spec: featureGates: InstanceIdNodeName: false # https://github.com/awslabs/amazon-eks-ami/issues/1821 kubelet: config: featureGates: DisableKubeletCloudCredentialProviders: true registryPullQPS: 100 serializeImagePulls: false shutdownGracePeriod: 30s --// ```

Do you have any spread or affinity on your workloads currently?

We did not - however as a dirty workaround we do now to force Karpenter into the other zones. That is not a solution. I believe it is due to availabilities and costs that Karpenter becomes unbalanced, it is actually flapping. However since it causes downtime for us, it is a severe issue.

aws / karpenter-provider-aws

exhausted IP addresses from unbalanced zone distribution #7311

Description