[Help] eksctl + cluster-autoscaler: scale-up from minSize=0

obitech commented 2 years ago

I'm having trouble getting the cluster-autoscaler to work with a node group of minSize: 0. I’ve followed the eksctl docs on the topic and set my labels and taints as tags on the nodeGroup definition:

   minSize: 0
    maxSize: 1
    instanceType: r6g.large
    labels:
      role: worker
      node-role.exaring.net/workload-type: mem-intensive
    taints:
      - key: "node.cilium.io/agent-not-ready"
        value: "true"
        effect: "NoSchedule"
      - key: node-role.exaring.net/workload-type
        value: mem-intensive
        effect: NoSchedule
      - key: arch
        value: arm64
        effect: NoSchedule
    availabilityZones:
      - eu-central-1a
    tags:
      k8s.io/cluster-autoscaler/node-template/label/role: worker
      k8s.io/cluster-autoscaler/node-template/label/node-role.exaring.net/workload-type: mem-intensive
      k8s.io/cluster-autoscaler/node-template/taint/node.cilium.io/agent-not-ready: true:NoSchedule
      k8s.io/cluster-autoscaler/node-template/taint/node.cilium.io/node-role.exaring.net/workload-type: mem-intensive:NoSchedule
      k8s.io/cluster-autoscaler/node-template/taint/arch: arm64:NoSchedule

Checking the node group in the AWS console, the tags and taints seem to be correctly set on the node group itself. However, the autoscaler logs state:

klogx.go:86] Pod xxx is unschedulable
scale_up.go:300] Pod xxx can't be scheduled on yyy, predicate checking error: node(s) didn't match Pod's node affinity/selector; predicateName=NodeAffinity; reasons: node(s) didn't match Pod's node affinity/selector;
cale_up.go:453] No expansion options

Has anyone experienced this before? How can I debug this further?

github-actions[bot] commented 2 years ago

Hello obitech :wave: Thank you for opening an issue in eksctl project. The team will review the issue and aim to respond within 1-3 business days. Meanwhile, please read about the Contribution and Code of Conduct guidelines here. You can find out more information about eksctl on our website

nikimanoledaki commented 2 years ago

Hi! Sorry for the delay in getting back to you. I recommend making the following changes:

nodeGroups:
  - name: name
    minSize: 0
    maxSize: 1
    instanceType: r6g.large
    iam:
      withAddonPolicies:
        autoScaler: true // adds the tags required for the Cluster Autoscaler to scale the nodegroup(s)
    taints:
      - key: "node.cilium.io/agent-not-ready"
        value: "true"
        effect: "NoSchedule"
      - key: node-role.exaring.net/workload-type // removed the label that was repeating this
        value: mem-intensive
        effect: NoSchedule
      - key: arch
        value: arm64
        effect: NoSchedule
    propagateASGTags: true // propagates taints into ASG tags
    availabilityZones:
      - eu-central-1a

Please let me know if this solved the issue for you :)

I have a PR open to improve the docs around this that should be out soon.

obitech commented 2 years ago

Unfortunately the issue persists 😞 I forgot to mention it's a managed nodegroup, might that be the issue?

nikimanoledaki commented 2 years ago

Yes, propagateASGTags has a slightly different behaviour for managed nodegroups (we have an open ticket to unify the behaviour for managed and unmanaged nodegroups). Currently, with propagateASGTags set to true, the labels and taints of managed nodegroup are not converted to nodegroup tags so they have to be added manually, like what you were doing before:

managedNodeGroups:
  - name: name
    minSize: 0
    maxSize: 1
    instanceType: r6g.large
    iam:
      withAddonPolicies:
        autoScaler: true // adds the tags required for the Cluster Autoscaler to scale the nodegroup(s)
    taints:
      - key: "node.cilium.io/agent-not-ready"
        value: "true"
        effect: "NoSchedule"
      - key: node-role.exaring.net/workload-type // removed the label that was repeating this
        value: mem-intensive
        effect: NoSchedule
      - key: arch
        value: arm64
        effect: NoSchedule
        tags:
    tags:
      k8s.io/cluster-autoscaler/node-template/label/role: worker
      k8s.io/cluster-autoscaler/node-template/label/node-role.exaring.net/workload-type: mem-intensive
      k8s.io/cluster-autoscaler/node-template/taint/node.cilium.io/agent-not-ready: true:NoSchedule
      k8s.io/cluster-autoscaler/node-template/taint/node.cilium.io/node-role.exaring.net/workload-type: mem-intensive:NoSchedule
      k8s.io/cluster-autoscaler/node-template/taint/arch: arm64:NoSchedule
    propagateASGTags: true // propagates taints into ASG tags
    availabilityZones:
      - eu-central-1a

The important part here is to add propagateASGTags to propagate nodegroup tags into ASG tags so that the Auto Scaling Group can pick up the nodegroups. :)

We recently updated the docs to reflect all of this in a better way: https://eksctl.io/usage/autoscaling/

This should hopefully solve the issue, please let us know if it did or not!

obitech commented 2 years ago

That did in fact work! Thank you for your help @nikimanoledaki 😌

eksctl-io / eksctl

[Help] eksctl + cluster-autoscaler: scale-up from minSize=0 #5514