aws / karpenter-provider-aws

Karpenter is a Kubernetes Node Autoscaler built for flexibility, performance, and simplicity.
https://karpenter.sh
Apache License 2.0
6.79k stars 955 forks source link

Binpacking algorithm incorrectly selects instance type with insufficient allocatable memory #1034

Closed jalawala closed 2 years ago

jalawala commented 2 years ago

Version

Karpenter: v0.0.0

Kubernetes: v1.0.0 0.5.3

Expected Behavior

This Issue is observed with smaller instance say t4g.micro which takes longer time to get into ready state T0 = pod in pending Line 4: arm64-6bcdd8f45-95j9v 0/1 Pending 0 5s ip-192-168-77-153.ec2.internal T1 = pod is terminated by Karpenter assuming that there is some problem with node but node is not ready yet Line 5: arm64-6bcdd8f45-95j9v 0/1 Terminating 0 52s ip-192-168-77-153.ec2.internal T2 = node is actually ready Line 161: ip-192-168-77-153.ec2.internal Ready 67s v1.21.5-eks-bc4871b t4g.micro arm64 spot Since pod is terminated and node is empty, TTL will be applied and removed. Karpenter will trigger a new instance but same Issue repets forerver and pod NEVER gets scheduled. Karpenter should consider slow start up of small instance types before considering it as un healthy and then removing. Attacched file captures this beahivour in detail including logs.

Actual Behavior

Karpenter should wait long enough for nodes to be ready before considering them to unhealthy

Steps to Reproduce the Problem

I used below config for arm64 nodeSelector: kubernetes.io/arch: arm64 containers:

Karpenter selected t4g.micro where this issue is observed. detailed logs are attached below

Resource Specs and Logs

karpenter.txt

mikesir87 commented 2 years ago

I’ve run into this as well and I think narrowed it down to the number of DaemonSets I had running. For a t4g.micro instance, the max pod limit is only 4. So, there actually wasn’t enough room on the nodes to run the workloads. Are you running any DaemonSets?

ellistarn commented 2 years ago

It looks like our maxpods calculations are not factoring in daemonsets. We need to fix this.

mikesir87 commented 2 years ago

I bet this is what I was running into with #930. I’ll close that one in favor of this one.

olemarkus commented 2 years ago

It looks like also static/mirror Pods need to be taken into consideration. I frequently run into OutOfPods errors that I think comes from Pods being scheduled to a given node, but then enter this state as static pods take precedence.

felix-zhe-huang commented 2 years ago

I have recreated the original issue. The t4g.micro node is terminated because pods fail to schedule on it due to insufficient memory (pod requests 1000Mi and t4g.micro allocatable is only 558Mi). Screen Shot 2022-01-04 at 12 36 38 PM Screen Shot 2022-01-04 at 12 37 40 PM

The issue is caused by incorrect resource calculation in binpacking algorithm.

There is potentially a problem with maxpod calculation as well. I will fix the wrong calculation first and do some further experiment to confirm if maxpod calculation is also incorrect.

felix-zhe-huang commented 2 years ago

Create a separate issue to track the daemonset problem mentioned above (#1084)

felix-zhe-huang commented 2 years ago

Issue is resolved by https://github.com/aws/karpenter/pull/1080. Coming in 0.5.4