aws / karpenter-provider-aws

Karpenter is a Kubernetes Node Autoscaler built for flexibility, performance, and simplicity.
https://karpenter.sh
Apache License 2.0
6.86k stars 967 forks source link

Karpenter won't use fallback nodePool on insufficient instance capacity #6168

Open raychinov opened 6 months ago

raychinov commented 6 months ago

Description

Observed Behavior: We have a default nodePool in one availability zone (a) with higher weight and a second fallback nodePool in another availability zone (b) with lower weight. The scenario we test is with AWS Fault Injection Simulator, where all a AZ instances are terminated and new instance launches are paused. What we observe is new nodeClaims being created but stay in Non-Ready state and nodes fail to launch with an error message like:

creating instance, getting launch template configs, getting launch templates, no instance types satisfy requirements of amis ami-02f420afc14289ede

The concerning thing here is that Karpenter won't try to launch instances from the fallback nodePool even after several minutes of waiting. The messages in the log are mostly:

{"level":"DEBUG","time":"2024-05-08T11:26:25.025Z","logger":"controller.disruption","message":"waiting on cluster sync","commit":"2c8f2a5"}
{"level":"DEBUG","time":"2024-05-08T11:26:26.026Z","logger":"controller.disruption","message":"waiting on cluster sync","commit":"2c8f2a5"}
{"level":"DEBUG","time":"2024-05-08T11:26:27.027Z","logger":"controller.disruption","message":"waiting on cluster sync","commit":"2c8f2a5"}

If we increase the fallback nodePool weight and bump the workload so new nodeClaims are created, they would start in the b AZ and get provisioned successfully, but the a AZ nodeClaims will stay pending as well as the pods meant to start on them.

Expected Behavior: Unsuccessful nodeClaims would time out after a couple of seconds, and new ones will be created in an alternative nodePool. Also, having more verbose log messages would be appreciated.

Reproduction Steps (Please include YAML):

NodePools ``` --- apiVersion: karpenter.sh/v1beta1 kind: NodePool metadata: name: default annotations: kubernetes.io/description: "Default NodePool" spec: weight: 100 template: spec: requirements: - key: karpenter.sh/capacity-type operator: In values: ["spot"] - key: node.kubernetes.io/instance-type operator: In values: ["c7g.large", "c7g.xlarge"] - key: topology.kubernetes.io/zone operator: In values: - "eu-west-1a" nodeClassRef: name: default taints: - key: karpenter value: "true" effect: NoSchedule kubelet: maxPods: 125 --- apiVersion: karpenter.sh/v1beta1 kind: NodePool metadata: name: fallback annotations: kubernetes.io/description: "Fallback NodePool" spec: weight: 10 template: spec: requirements: - key: karpenter.sh/capacity-type operator: In values: ["spot"] - key: node.kubernetes.io/instance-type operator: In values: ["c7g.large", "c7g.xlarge"] - key: topology.kubernetes.io/zone operator: In values: - "eu-west-1b" - "eu-west-1c" nodeClassRef: name: default taints: - key: karpenter value: "true" effect: NoSchedule kubelet: maxPods: 125 ```
Workload ``` --- apiVersion: apps/v1 kind: Deployment metadata: name: test spec: replicas: 2 selector: matchLabels: app: test template: metadata: labels: app: test spec: securityContext: runAsUser: 65534 fsGroup: 65534 containers: - image: public.ecr.aws/eks-distro/kubernetes/pause:3.2 name: test resources: requests: cpu: 50m memory: 64M nodeSelector: karpenter.k8s.aws/instance-hypervisor: nitro tolerations: - key: karpenter operator: Equal value: "true" effect: NoSchedule affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: app operator: In values: - test topologyKey: kubernetes.io/hostname ```
FIS template ``` { "description": "AZ fail", "targets": { "Karp-EC2-Instances": { "resourceType": "aws:ec2:instance", "resourceTags": { "karpenter.k8s.aws/ec2nodeclass": "default" }, "filters": [ { "path": "State.Name", "values": [ "running" ] }, { "path": "Placement.AvailabilityZone", "values": [ "eu-west-1a" ] } ], "selectionMode": "ALL" }, "Karp-IAM-roles": { "resourceType": "aws:iam:role", "resourceArns": [ "arn:aws:iam::111111111111:role/karpenter-controller-role", "arn:aws:iam::111111111111:role/karpenter-node-role" ], "selectionMode": "ALL" } }, "actions": { "Pause-Instance-Launches": { "actionId": "aws:ec2:api-insufficient-instance-capacity-error", "parameters": { "availabilityZoneIdentifiers": "eu-west-1a", "duration": "PT10M", "percentage": "100" }, "targets": { "Roles": "Karp-IAM-roles" } }, "Terminate-Instances": { "actionId": "aws:ec2:terminate-instances", "parameters": {}, "targets": { "Instances": "Karp-EC2-Instances" } } }, "stopConditions": [ { "source": "none" } ], "roleArn": "arn:aws:iam::111111111111:role/service-role/AWSFISIAMRole-1714174839723", "tags": { "Name": "AZ Availability: insufficient-instance-capacity-error" }, "experimentOptions": { "accountTargeting": "single-account", "emptyTargetResolutionMode": "skip" } } ```

Versions:

jonathan-innis commented 6 months ago

Can you share the NodePool and EC2NodeClass that you are using here? Can you also share the entire set of Karpenter controller logs from the FIS simulation? Can you also share what exactly you are doing/executing during the FIS simulation (it seems like you are just ICE-ing all instance types across the single AZ)?

raychinov commented 6 months ago

Hey Jonathan, thank you for looking into this. I've shared the NodePools definition in the issue description, and here are the EC2NodeClass and Controller logs requested. And yes, in the FIS simulation, we are terminating the Kapenter-managed nodes and ICE-ing all instance types across the eu-west-1a AZ. Also, we use a custom AMI in the EC2NodeClass, but I don't think this is what causes the issue.

Looking into the logs again, it seems to me that happens is:

When we do the same tests with fallback NodePool containing a different set of instance types the fallback do work and new instances in eu-west-1b and eu-west-1c are provisined successfully.

github-actions[bot] commented 6 months ago

This issue has been inactive for 14 days. StaleBot will close this stale issue after 14 more days of inactivity.

github-actions[bot] commented 4 months ago

This issue has been inactive for 14 days. StaleBot will close this stale issue after 14 more days of inactivity.

github-actions[bot] commented 1 month ago

This issue has been inactive for 14 days. StaleBot will close this stale issue after 14 more days of inactivity.

github-actions[bot] commented 1 week ago

This issue has been inactive for 14 days. StaleBot will close this stale issue after 14 more days of inactivity.