aws / karpenter-provider-aws

Karpenter is a Kubernetes Node Autoscaler built for flexibility, performance, and simplicity.
https://karpenter.sh
Apache License 2.0
6.82k stars 959 forks source link

Controller no instance type satisfied resources #1783

Closed Tasmana-banana closed 2 years ago

Tasmana-banana commented 2 years ago

Version

Karpenter: v0.9.1

Kubernetes: v1.20.0

I used to use version 5.3 of the carpenter, but I decided it was time to upgrade. After a lot of configuration fixes, I ran into an error that I can not overcome .. Please help in solving the problem, in which direction to look?

ansible template provisioner

apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
  name: omo-karpenter
spec:
  requirements:
    - key: "topology.kubernetes.io/zone"
      operator: In
      values: ["{{ region }}a", "{{ region }}b", "{{ region }}c"]
    - key: "kubernetes.io/arch"
      operator: In
      values: ["amd64"]
    - key: "node.kubernetes.io/instance-type"
      operator: In
      values: ["t3a.medium"]
    - key: "karpenter.sh/capacity-type"
      operator: In
      values: ["on-demand"]
  limits:
    resources:
      cpu: "1000"
  provider:
    instanceProfile: "{{ instance_profile }}"
    subnetSelector:
      karpenter.sh/discovery: '{{ cluster_name }}'
    securityGroupSelector:
      kubernetes.io/cluster/test-eks-odoo: '*'
  ttlSecondsAfterEmpty: 120

try scale app to 3 pods, but have some errors:

2022-05-09T20:04:37.981Z    ERROR   controller  no instance type satisfied resources {"cpu":"1","pods":"1"} and requirements karpenter.sh/provisioner-name In [omo-karpenter], karpenter.sh/capacity-type In [on-demand], kubernetes.io/arch In [amd64], kubernetes.io/os In [linux], node.kubernetes.io/instance-type In [t3a.medium], kubernetes.io/hostname In [hostname-placeholder-0036], topology.kubernetes.io/zone In [us-west-2a us-west-2b us-west-2c]    {"commit": "a7e26e6", "pod": "default/inflate-59664786cf-4n82r"}
2022-05-09T20:04:37.981Z    INFO    controller  Waiting for unschedulable pods  {"commit": "a7e26e6"}
2022-05-09T20:04:42.970Z    INFO    controller  Batched 1 pod(s) in 1.000404385s    {"commit": "a7e26e6"}
2022-05-09T20:04:42.978Z    DEBUG   controller  Relaxing soft constraints for pod since it previously failed to schedule, adding: toleration for PreferNoSchedule taints    {"commit": "a7e26e6"}
2022-05-09T20:04:42.979Z    ERROR   controller  no instance type satisfied resources {"cpu":"1","pods":"1"} and requirements kubernetes.io/os In [linux], topology.kubernetes.io/zone In [us-west-2a us-west-2b us-west-2c], karpenter.sh/provisioner-name In [omo-karpenter], karpenter.sh/capacity-type In [on-demand], kubernetes.io/arch In [amd64], node.kubernetes.io/instance-type In [t3a.medium], kubernetes.io/hostname In [hostname-placeholder-0038]    {"commit": "a7e26e6", "pod": "default/inflate-59664786cf-4n82r"}
2022-05-09T20:04:42.979Z    INFO    controller  Waiting for unschedulable pods  {"commit": "a7e26e6"}

upd. INstance created, but me tregered this error. Can someone suggest a finer setting?

Tasmana-banana commented 2 years ago

Simple test deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: inflate
spec:
  replicas: 1
  selector:
    matchLabels:
      app: inflate
  template:
    metadata:
      labels:
        app: inflate
    spec:
            #nodeSelector:
        #karpenter.sh/capacity-type: on-demand
        #node.kubernetes.io/instance-type: t3a.medium
      containers:
        - name: inflate
          image: public.ecr.aws/eks-distro/kubernetes/pause:3.2
          resources:
            requests:
              cpu: 1
dewjam commented 2 years ago

Hello @Tasmana-banana , Do you have any DaemonSets running in this cluster which are also requesting CPU resources?

Karpenter will take into account the resources required by DaemonSets when scheduling pods. If the sum of DaemonSet and Pod resources is greater than what's available in a t3a.medium then Karpenter will not be able to launch a node.

Tasmana-banana commented 2 years ago

Thanks for the answer @dewjam No, it's new claster eks.

Outputs: kubectl get daemonsets -A

NAMESPACE     NAME         DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
kube-system   aws-node     3         3         3       3            3           <none>          11h
kube-system   kube-proxy   3         3         3       3            3           <none>          11h

kubectl get pods -A

NAMESPACE     NAME                              READY   STATUS    RESTARTS   AGE
default       inflate-59664786cf-4ng7k          1/1     Running   0          91m
default       inflate-59664786cf-hjg7k          1/1     Running   0          27m
default       inflate-59664786cf-rb5dz          0/1     Pending   0          117s
karpenter     karpenter-577fb865d7-wjpnp        2/2     Running   0          117s
kube-system   aws-node-8fxct                    1/1     Running   0          25m
kube-system   aws-node-klbm7                    1/1     Running   0          11h
kube-system   aws-node-t6pp2                    1/1     Running   0          11h
kube-system   coredns-86d9946576-f7ns8          1/1     Running   0          11h
kube-system   coredns-86d9946576-ts4jc          1/1     Running   0          11h
kube-system   kube-proxy-jm8fr                  1/1     Running   0          25m
kube-system   kube-proxy-kgdr9                  1/1     Running   0          11h
kube-system   kube-proxy-shxnz                  1/1     Running   0          11h
kube-system   metrics-server-6594d67d48-s8vb7   1/1     Running   0          9h
dewjam commented 2 years ago

Would you mind providing the output of the below commands as well?

kubectl get pods -A -o wide

Tasmana-banana commented 2 years ago

@dewjam

NAMESPACE     NAME                              READY   STATUS    RESTARTS   AGE    IP            NODE                                        NOMINATED NODE   READINESS GATES
default       inflate-59664786cf-4ng7k          1/1     Running   0          16h    10.65.5.38    ip-10-65-5-129.us-west-2.compute.internal   <none>           <none>
default       inflate-59664786cf-5fpfh          0/1     Pending   0          106s   <none>        <none>                                      <none>           <none>
default       inflate-59664786cf-wcd4w          1/1     Running   0          107s   10.65.8.230   ip-10-65-9-49.us-west-2.compute.internal    <none>           <none>
karpenter     karpenter-577fb865d7-wjpnp        2/2     Running   0          15h    10.65.7.154   ip-10-65-7-49.us-west-2.compute.internal    <none>           <none>
kube-system   aws-node-klbm7                    1/1     Running   0          26h    10.65.5.129   ip-10-65-5-129.us-west-2.compute.internal   <none>           <none>
kube-system   aws-node-rwj2g                    1/1     Running   0          68s    10.65.9.49    ip-10-65-9-49.us-west-2.compute.internal    <none>           <none>
kube-system   aws-node-t6pp2                    1/1     Running   0          26h    10.65.7.49    ip-10-65-7-49.us-west-2.compute.internal    <none>           <none>
kube-system   coredns-86d9946576-f7ns8          1/1     Running   0          26h    10.65.6.8     ip-10-65-7-49.us-west-2.compute.internal    <none>           <none>
kube-system   coredns-86d9946576-ts4jc          1/1     Running   0          26h    10.65.4.115   ip-10-65-5-129.us-west-2.compute.internal   <none>           <none>
kube-system   kube-proxy-6zq94                  1/1     Running   0          68s    10.65.9.49    ip-10-65-9-49.us-west-2.compute.internal    <none>           <none>
kube-system   kube-proxy-kgdr9                  1/1     Running   0          26h    10.65.5.129   ip-10-65-5-129.us-west-2.compute.internal   <none>           <none>
kube-system   kube-proxy-shxnz                  1/1     Running   0          26h    10.65.7.49    ip-10-65-7-49.us-west-2.compute.internal    <none>           <none>
kube-system   metrics-server-6594d67d48-s8vb7   1/1     Running   0          24h    10.65.4.230   ip-10-65-5-129.us-west-2.compute.internal   <none>           <none>
dewjam commented 2 years ago

Thanks @Tasmana-banana . Unfortunately, I'm not able to reproduce this problem in my test setup. Just so I fully understand the situation, I have a couple more questions.

  1. You mentioned you upgraded Karpenter from 0.5.3 to 0.9.1 in this cluster. Did you follow the upgrade guide here?

  2. Looking at the output above, I see three nodes. I assume at least one of them was created by a Managed Node Group. Were the other two launched by Karpenter? If so, were they launched by Karpenter 0.9.1?

Tasmana-banana commented 2 years ago

Thanks for questions @dewjam

  1. I set up a completely new cluster with two nodes in a node group and installed karpenter 0.9.1 Also, part playbook karpenter controller

    kubernetes.core.helm:
    name: '{{ karpenter_namespace }}'
    create_namespace: true
    release_namespace: '{{ karpenter_namespace }}'
    chart_ref: 'karpenter/karpenter'
    chart_version: '0.9.1'
    release_values:
      serviceAccount:
        create: true
        name: 'karpenter'
        annotations:
          eks.amazonaws.com/role-arn: 'arn:aws:iam::{{ account_id }}:role/EKS-Karpenter-Role'
      clusterName: '{{ cluster_name }}'
      clusterEndpoint: '{{ cluster_endpoint }}'
      defaultProvisioner: false
      aws:
        defaultInstanceProfile: '{{ instance_profile }}'
  2. No, the other two nodes are the Node group of my cluster.

dewjam commented 2 years ago

OK, I see two of the inflate pods are running. Are they both running on the Managed Nodes? Or was one of those nodes launched by Karpenter?

ip-10-65-9-49.us-west-2.compute.internal looks like it was launched most recently, so I'm assuming this was launched by Karpenter, but wanted to be sure.

Tasmana-banana commented 2 years ago

U right @dewjam ! ip-10-65-9-49.us-west-2.compute.internal launched by Karpenter, but the pods that I scaled do not migrate to it

dewjam commented 2 years ago

Hey @Tasmana-banana . My apologies for the delayed response. Have you tried another instance type by chance? Given at least one node was launched, I'm doubtful there is a permissions issue at play.

Have you tried an instance type other than t3a.medium? For example, can you try a t3.medium or an m5a.large?

Tasmana-banana commented 2 years ago

Hello @dewjam Sory me too I just ran out of ideas already.. Tried to add a small instance

dewjam commented 2 years ago

Hey @Tasmana-banana , Just to confirm, were you able to try to add a different instance type to your provisioner spec?

dewjam commented 2 years ago

Hey @Tasmana-banana, just following up on this issue. Were you able to find a workaround?

Tasmana-banana commented 2 years ago

Hello @dewjam. Story, in my country war, i can't investigate any. U can close issuer. Maybe I write to you after months. Take care of yourself!

dewjam commented 2 years ago

Our thoughts are with you @Tasmana-banana ! Be safe!

Feel free to re-open this whenever you're ready.

zakariais commented 1 year ago

@dewjam Getting this issue incompatible with provisioner "control-plane-egress", no instance type satisfied resources {"cpu":"600m","memory":"1152Mi","pods":"1"} and requirements karpenter.k8s.aws/instance-size NotIn [16xlarge 18xlarge 24xlarge 32xlarge 48xlarge and 5 others], kubernetes.io/arch In [amd64], project In [control-plane], intent In [egress-karpenter], nodegroup-name In [control-plane-egress], karpenter.sh/provisioner-name In [control-plane-egress], topology.kubernetes.io/zone In [us-east-1b us-east-1d us-east-1e], kubernetes.io/os In [linux], karpenter.k8s.aws/instance-family In [c5 c5d c5n c6a c6i and 16 others], karpenter.sh/capacity-type In [on-demand spot] Any possible recommendations or solution i can try? I have specified a subnet selector in AWS Node Template trying to use a private subnet with using subnet name subnetSelector: Name: eks-egress-nat

Karpenter version: 0.21.1 EKS version: 1.23

ellistarn commented 1 year ago

Can you open a new issues @zakariais?

zakariais commented 1 year ago

@ellistarn https://github.com/aws/karpenter/issues/3211 Created