aws / karpenter-provider-aws

Karpenter is a Kubernetes Node Autoscaler built for flexibility, performance, and simplicity.
https://karpenter.sh
Apache License 2.0
6.87k stars 969 forks source link

Karpenter frequently pulls up nodes and pods cannot be scheduled #6543

Open andyblog opened 4 months ago

andyblog commented 4 months ago

Description

Observed Behavior:

The resources required by the pod are 7 cores and 5 GB. The node specification is set to c5.2xlarge in the nodepool configuration. This node is configured to be 8 cores and 16 GB. Therefore, this node can allow the pod to run normally, but it appears to be pulling new node all the time, and the pod still is pending.

My understanding is: kubelet on the node reserves 500m cpu in its own resource settings, so the allocable cpu is 7500m. Daemonset pod requires 700m cpu, so when the node is started and daemonset is scheduled, centos pod has no resources to run on the node, so karpenter continues to pull up a new machine.

Expected Behavior:

This situation should have been encountered by other companies. What is the best practice and how to avoid this situation? Is it possible to add fixed resources in the karpenter configuration file to indicate the resource reservation size of kubelet?

How does Karpenter calculate the required node specifications: Is it determined by the resources required by the business pod + daemonset pod?

Reproduction Steps (Please include YAML):

karpenter version

$ helm list -n karpenter
NAME        NAMESPACE   REVISION    UPDATED                                 STATUS      CHART               APP VERSION
karpenter   karpenter   6           2024-05-22 11:24:01.974303415 +0800 CST deployed    karpenter-v0.32.9   0.32.9     

karpenter nodepool config

$ k get nodepool dev-git -o yaml
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: dev-git
spec:
  disruption:
    consolidateAfter: 5m0s
    consolidationPolicy: WhenEmpty
    expireAfter: Never
  template:
    metadata: {}
    spec:
      nodeClassRef:
        name: dev-git
      requirements:
      - key: app.abc.test-eks/business
        operator: In
        values:
        - git
      - key: app.abc.test-eks/env
        operator: In
        values:
        - dev
      - key: kubernetes.io/os
        operator: In
        values:
        - linux
      - key: kubernetes.io/arch
        operator: In
        values:
        - amd64
      - key: karpenter.sh/capacity-type
        operator: In
        values:
        - on-demand
      - key: node.kubernetes.io/instance-type
        operator: In
        values:
        - c5.2xlarge

my deployment yaml

$ cat centos.yaml 
apiVersion: apps/v1
kind: Deployment
metadata:
  name: centos
  namespace: git-test
spec:
  replicas: 1
  selector:
    matchLabels:
      name: centos
  template:
    metadata:
      labels:
        name: centos
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                  - key: app.abc.test-eks/business
                    operator: In
                    values:
                      - git
                  - key: app.abc.test-eks/env
                    operator: In
                    values:
                      - dev
      containers:
      - name: centos1
        image: centos:7
        imagePullPolicy: Always
        resources:
          requests:
            cpu: "7"
            memory: 5Gi
          limits:
            cpu: "7"
            memory: 5Gi

        command: ["sleep", "infinity"]

log

$ k get NodeClaim | grep git
dev-git-7ldrb        c5.2xlarge      ap-northeast-1c   ip-10-10-1-174.ap-northeast-1.compute.internal   False   28s
dev-git-8smvf        c5.2xlarge      ap-northeast-1c   ip-10-10-1-154.ap-northeast-1.compute.internal   True    2m47s
dev-git-bwhxs        c5.2xlarge      ap-northeast-1c   ip-10-10-1-81.ap-northeast-1.compute.internal    True    4m4s
dev-git-cs5tg        c5.2xlarge      ap-northeast-1c   ip-10-10-1-254.ap-northeast-1.compute.internal   True    5m48s
dev-git-gkms8        c5.2xlarge      ap-northeast-1c   ip-10-10-1-24.ap-northeast-1.compute.internal    True    2m18s
dev-git-hshrz        c5.2xlarge      ap-northeast-1c   ip-10-10-1-198.ap-northeast-1.compute.internal   True    114s
dev-git-lqsgw        c5.2xlarge      ap-northeast-1c   ip-10-10-1-248.ap-northeast-1.compute.internal   True    4m58s
dev-git-mq2nf        c5.2xlarge      ap-northeast-1c   ip-10-10-1-74.ap-northeast-1.compute.internal    True    54s
dev-git-qdbgd        c5.2xlarge      ap-northeast-1c   ip-10-10-1-57.ap-northeast-1.compute.internal    True    4m28s
dev-git-qtshk        c5.2xlarge      ap-northeast-1c   ip-10-10-1-236.ap-northeast-1.compute.internal   True    3m8s
dev-git-sqqqn        c5.2xlarge      ap-northeast-1c   ip-10-10-1-91.ap-northeast-1.compute.internal    True    87s
dev-git-wzfhd        c5.2xlarge      ap-northeast-1c   ip-10-10-1-218.ap-northeast-1.compute.internal   True    3m38s

pod still is pending

$ k get pod
NAME                      READY   STATUS    RESTARTS   AGE
centos-654b59b549-5hnn9   0/1     Pending   0          7m4s

Versions:

aowczarek619 commented 4 months ago

it happens also on version 0.37.0

k describe <node>
.
.
  Resource                   Requests    Limits
  --------                   --------    ------
  cpu                        507m (12%)  302m (7%)
.
.

requirements for my pods are CPU: "3.5", RAM: "15Gi" and in my case karpenter creates r5dn.xlarge which is too small for my pod and only daemon set pods are running there.

indranilvyasFlipp commented 4 months ago

I'm also noticing similar issue, where Pods are scheduled by Karpenter on a nodeclaim, while default-scheduler decides that given node from Karpenter has insufficient resources and won't run the pod on this node, hence I have Pod stuck in Pending for many hours sometimes days

andyblog commented 4 months ago

If you encounter this problem, you can refer to this link of karpenter, which describes how to add resource reservation in the configuration to solve this problem. Please ask the R&D staff to help confirm whether there is another way to solve it. https://karpenter.sh/docs/concepts/nodepools/#reserved-resources

indranilvyasFlipp commented 4 months ago

we tried bumping up kube reserved memory still seeing issues where nodes are created while kube scheduler is unhappy with node created and won't place the pod, but we noticed at in past when we were at v0.27.X Karpenter was able to relax the topologConstraints and place the nodes properly, our daemonsets haven't changed before or after upgrade of Karpenter.

I have two deployments with same topologyConstraints and still seeing issues with one deployment unable to have pods scheduled while other deployment succeeds