aws / karpenter-provider-aws

Karpenter is a Kubernetes Node Autoscaler built for flexibility, performance, and simplicity.
https://karpenter.sh
Apache License 2.0
6.81k stars 958 forks source link

Pod cannot be scheduled to istance storage node #5169

Open runningman84 opened 11 months ago

runningman84 commented 11 months ago

Description

Observed Behavior: If I schedule this pod

apiVersion: v1
kind: Pod
metadata:
  name: testpod
spec:
  nodeSelector:
    karpenter.sh/nodepool: x86-instance-store
  tolerations:
  - key: "instance-store"
    value: "true"
    operator: "Equal"
    effect: "NoSchedule"
  containers:
  - name: curl
    image: curlimages/curl
    command: [ "sleep", "600" ]
    resources:
      requests:
        cpu: 1
        memory: 256Mi

I get this error:

{"level":"ERROR","time":"2023-11-27T15:58:25.260Z","logger":"controller.provisioner","message":"Could not schedule pod, incompatible with nodepool \"arm-instance-store\", daemonset overhead={\"cpu\":\"180m\",\"memory\":\"504Mi\",\"pods\":\"4\"}, did not tolerate arch=arm64:NoSchedule; incompatible with provisioner \"arm\", daemonset overhead={\"cpu\":\"205m\",\"memory\":\"674Mi\",\"pods\":\"7\"}, did not tolerate arch=arm64:NoSchedule; incompatible with nodepool \"x86-instance-store\", daemonset overhead={\"cpu\":\"180m\",\"memory\":\"504Mi\",\"pods\":\"4\"}, no instance type satisfied resources {\"cpu\":\"280m\",\"memory\":\"760Mi\",\"pods\":\"5\"} and requirements karpenter.k8s.aws/instance-category In [c hpc m r x and 1 others], karpenter.k8s.aws/instance-generation Exists >3, karpenter.k8s.aws/instance-hypervisor In [nitro], karpenter.k8s.aws/instance-local-nvme In [50], karpenter.sh/capacity-type In [on-demand spot], karpenter.sh/nodepool In [x86-instance-store], kubernetes.io/arch In [amd64], kubernetes.io/os In [linux] (no instance type has enough resources); incompatible with provisioner \"x86\", daemonset overhead={\"cpu\":\"205m\",\"memory\":\"674Mi\",\"pods\":\"7\"}, incompatible requirements, key karpenter.k8s.aws/instance-local-nvme, karpenter.k8s.aws/instance-local-nvme In [50] not in karpenter.k8s.aws/instance-local-nvme DoesNotExist; all available instance types exceed limits for provisioner: \"cron\"","commit":"5eda5c1","pod":"default/testpod"}
apiVersion: v1
items:
- apiVersion: karpenter.sh/v1beta1
  kind: NodePool
  metadata:
    annotations:
      karpenter.sh/nodepool-hash: "14551328993050931519"
    creationTimestamp: "2023-11-27T10:20:30Z"
    generation: 1
    labels:
      kustomize.toolkit.fluxcd.io/name: karpenter-custom-provisioner
      kustomize.toolkit.fluxcd.io/namespace: flux-system
    name: arm-instance-store
    resourceVersion: "868412088"
    uid: ff78546d-28a9-452e-b7c0-2cc5cfb224b2
  spec:
    disruption:
      consolidationPolicy: WhenUnderutilized
      expireAfter: 604800s
    limits:
      cpu: 100
    template:
      metadata: {}
      spec:
        kubelet:
          evictionHard:
            memory.available: 0.2Gi
            nodefs.available: 10%
            nodefs.inodesFree: 10%
          evictionMaxPodGracePeriod: 180
          evictionSoft:
            memory.available: 500Mi
            nodefs.available: 15%
            nodefs.inodesFree: 15%
          evictionSoftGracePeriod:
            memory.available: 3m0s
            nodefs.available: 3m0s
            nodefs.inodesFree: 3m0s
          kubeReserved:
            cpu: 250m
            ephemeral-storage: 1Gi
            memory: 200Mi
          systemReserved:
            cpu: 250m
            ephemeral-storage: 1Gi
            memory: 200Mi
        nodeClassRef:
          name: bottlerocket-instance-store
        requirements:
        - key: karpenter.sh/capacity-type
          operator: In
          values:
          - on-demand
          - spot
        - key: kubernetes.io/arch
          operator: In
          values:
          - arm64
        - key: karpenter.k8s.aws/instance-hypervisor
          operator: In
          values:
          - nitro
        - key: karpenter.k8s.aws/instance-local-nvme
          operator: Gt
          values:
          - "45"
        - key: kubernetes.io/os
          operator: In
          values:
          - linux
        taints:
        - effect: NoSchedule
          key: arch
          value: arm64
        - effect: NoSchedule
          key: instance-store
          value: "true"
    weight: 30
- apiVersion: karpenter.sh/v1beta1
  kind: NodePool
  metadata:
    annotations:
      karpenter.sh/nodepool-hash: "14918018216109419487"
    creationTimestamp: "2023-11-27T10:20:30Z"
    generation: 1
    labels:
      kustomize.toolkit.fluxcd.io/name: karpenter-custom-provisioner
      kustomize.toolkit.fluxcd.io/namespace: flux-system
    name: x86-instance-store
    resourceVersion: "868561144"
    uid: faea5b2c-85a0-476d-aee3-de19b0df14cc
  spec:
    disruption:
      consolidationPolicy: WhenUnderutilized
      expireAfter: 604800s
    limits:
      cpu: 100
    template:
      metadata: {}
      spec:
        kubelet:
          evictionHard:
            memory.available: 0.2Gi
            nodefs.available: 10%
            nodefs.inodesFree: 10%
          evictionMaxPodGracePeriod: 180
          evictionSoft:
            memory.available: 500Mi
            nodefs.available: 15%
            nodefs.inodesFree: 15%
          evictionSoftGracePeriod:
            memory.available: 3m0s
            nodefs.available: 3m0s
            nodefs.inodesFree: 3m0s
          kubeReserved:
            cpu: 250m
            ephemeral-storage: 1Gi
            memory: 200Mi
          systemReserved:
            cpu: 250m
            ephemeral-storage: 1Gi
            memory: 200Mi
        nodeClassRef:
          name: bottlerocket-instance-store
        requirements:
        - key: karpenter.sh/capacity-type
          operator: In
          values:
          - on-demand
          - spot
        - key: kubernetes.io/arch
          operator: In
          values:
          - amd64
        - key: karpenter.k8s.aws/instance-category
          operator: In
          values:
          - c
          - m
          - r
          - x
          - z
          - hpc
        - key: karpenter.k8s.aws/instance-generation
          operator: Gt
          values:
          - "3"
        - key: karpenter.k8s.aws/instance-hypervisor
          operator: In
          values:
          - nitro
        - key: karpenter.k8s.aws/instance-local-nvme
          operator: Gt
          values:
          - "45"
        - key: kubernetes.io/os
          operator: In
          values:
          - linux
        taints:
        - effect: NoSchedule
          key: instance-store
          value: "true"
    weight: 20
  status: {}
kind: List
metadata:
  resourceVersion: ""

Expected Behavior: The pod should be scheduled to some node which has instance storage available.

Reproduction Steps (Please include YAML): see above

Versions:

jmdeal commented 11 months ago

Interesting, I applied your nodepool and pod into a cluster with a fresh install of Karpenter v0.32.2 and didn't have any issues. What region / AZ are you trying to create nodes in? Your requirements seem loose enough but there could geniunely not be any instances which meet your requirements available in that AZ (I was able to successfully launch in us-west-2a).

jmdeal commented 11 months ago

Also it looks like there might be some discrepancy between the NodePools you posted and the ones that were used when the error logs occurred. I'm basing this off the error stating that it couldn't find a node that matched the requirement karpenter.k8s.aws/instance-local-nvme In 50 vs karpenter.k8s.aws/instance-local-nvme Exists >45. I'm wondering if there could be some other difference in the original NodePool that could have caused the error.

runningman84 commented 11 months ago

Okay maybe the error occured using my first try specifying the instance local nvme instead of the nodepool in the nodeselector. This is a new error:

{"level":"ERROR","time":"2023-11-27T18:37:26.984Z","logger":"controller.provisioner","message":"Could not schedule pod, incompatible with nodepool \"arm-instance-store\", daemonset overhead={\"cpu\":\"180m\",\"memory\":\"504Mi\",\"pods\":\"4\"}, did not tolerate arch=arm64:NoSchedule; incompatible with provisioner \"arm\", daemonset overhead={\"cpu\":\"205m\",\"memory\":\"674Mi\",\"pods\":\"7\"}, did not tolerate arch=arm64:NoSchedule; incompatible with nodepool \"x86-instance-store\", daemonset overhead={\"cpu\":\"180m\",\"memory\":\"504Mi\",\"pods\":\"4\"}, no instance type satisfied resources {\"cpu\":\"280m\",\"memory\":\"760Mi\",\"pods\":\"5\"} and requirements karpenter.k8s.aws/instance-category In [c hpc m r x and 1 others], karpenter.k8s.aws/instance-generation Exists >3, karpenter.k8s.aws/instance-hypervisor In [nitro], karpenter.k8s.aws/instance-local-nvme Exists >45, karpenter.sh/capacity-type In [on-demand spot], karpenter.sh/nodepool In [x86-instance-store], kubernetes.io/arch In [amd64], kubernetes.io/os In [linux] (no instance type has enough resources); incompatible with provisioner \"x86\", daemonset overhead={\"cpu\":\"205m\",\"memory\":\"674Mi\",\"pods\":\"7\"}, incompatible requirements, label \"karpenter.sh/nodepool\" does not have known values; all available instance types exceed limits for provisioner: \"cron\"","commit":"5eda5c1","pod":"default/testpod"

I am using region eu-central-1.

jmdeal commented 11 months ago

I'll go ahead an spin up a cluster in eu-central-1 and see if I can reproduce, it should have instance types which satisfy the requirements. Does this only occur when you specify the karpenter.k8s.aws/instance-local-nvme requirement or does it occur without that requirement as well?

jmdeal commented 11 months ago

I was able to spin up a cluster and successfully launch instances in eu-central-1. Are you able to share more complete logs and your current nodepool / ec2nodeclasses?

runningman84 commented 11 months ago

okay here are all ec2nodeclass objects (there is only one):

$ kubectl get ec2nodeclass -o yaml
apiVersion: v1
items:
- apiVersion: karpenter.k8s.aws/v1beta1
  kind: EC2NodeClass
  metadata:
    annotations:
      karpenter.k8s.aws/ec2nodeclass-hash: "9990827383292664005"
    creationTimestamp: "2023-11-27T11:49:02Z"
    finalizers:
    - karpenter.k8s.aws/termination
    generation: 3
    labels:
      kustomize.toolkit.fluxcd.io/name: karpenter-custom-provisioner
      kustomize.toolkit.fluxcd.io/namespace: flux-system
    name: bottlerocket-instance-store
    resourceVersion: "869080037"
    uid: bc5437cd-76da-4b3a-95e5-b57dbb0daed0
  spec:
    amiFamily: Bottlerocket
    blockDeviceMappings:
    - deviceName: /dev/xvda
      ebs:
        deleteOnTermination: true
        encrypted: true
        volumeSize: 10Gi
        volumeType: gp3
    - deviceName: /dev/xvdb
      ebs:
        deleteOnTermination: true
        encrypted: true
        volumeSize: 2Gi
        volumeType: gp3
    metadataOptions:
      httpEndpoint: enabled
      httpProtocolIPv6: disabled
      httpPutResponseHopLimit: 2
      httpTokens: required
    role: KarpenterNodeRole-dev-example-digital-products
    securityGroupSelectorTerms:
    - tags:
        aws:eks:cluster-name: dev-example-digital-products
    subnetSelectorTerms:
    - tags:
        kubernetes.io/role/internal-elb: "1"
    tags:
      eks:cluster-name: dev-example-digital-products
    userData: |
      [settings.bootstrap-containers.setup-runtime-storage-full]
      source = "ghcr.io/arvatoaws-labs/setup-runtime-storage-full:latest"
      mode = "always"
      essential = true
  status:
    amis:
    - id: ami-018d45a4d762f019d
      name: bottlerocket-aws-k8s-1.28-nvidia-aarch64-v1.16.1-763f6d4c
      requirements:
      - key: kubernetes.io/arch
        operator: In
        values:
        - arm64
      - key: karpenter.k8s.aws/instance-gpu-count
        operator: Exists
    - id: ami-018d45a4d762f019d
      name: bottlerocket-aws-k8s-1.28-nvidia-aarch64-v1.16.1-763f6d4c
      requirements:
      - key: kubernetes.io/arch
        operator: In
        values:
        - arm64
      - key: karpenter.k8s.aws/instance-accelerator-count
        operator: Exists
    - id: ami-02ccd2239604b18af
      name: bottlerocket-aws-k8s-1.28-aarch64-v1.16.1-763f6d4c
      requirements:
      - key: kubernetes.io/arch
        operator: In
        values:
        - arm64
      - key: karpenter.k8s.aws/instance-gpu-count
        operator: DoesNotExist
      - key: karpenter.k8s.aws/instance-accelerator-count
        operator: DoesNotExist
    - id: ami-0e71498b4e4e56727
      name: bottlerocket-aws-k8s-1.28-nvidia-x86_64-v1.16.1-763f6d4c
      requirements:
      - key: kubernetes.io/arch
        operator: In
        values:
        - amd64
      - key: karpenter.k8s.aws/instance-gpu-count
        operator: Exists
    - id: ami-0e71498b4e4e56727
      name: bottlerocket-aws-k8s-1.28-nvidia-x86_64-v1.16.1-763f6d4c
      requirements:
      - key: kubernetes.io/arch
        operator: In
        values:
        - amd64
      - key: karpenter.k8s.aws/instance-accelerator-count
        operator: Exists
    - id: ami-095132eeb54fa060c
      name: bottlerocket-aws-k8s-1.28-x86_64-v1.16.1-763f6d4c
      requirements:
      - key: kubernetes.io/arch
        operator: In
        values:
        - amd64
      - key: karpenter.k8s.aws/instance-gpu-count
        operator: DoesNotExist
      - key: karpenter.k8s.aws/instance-accelerator-count
        operator: DoesNotExist
    instanceProfile: dev-example-digital-products_16990360688165669647
    securityGroups:
    - id: sg-02a48f5a112c61d41
      name: eks-cluster-sg-dev-example-digital-products-1067367652
    subnets:
    - id: subnet-01a9caa728107681b
      zone: eu-central-1c
    - id: subnet-06922fee133aa1aad
      zone: eu-central-1b
    - id: subnet-034ed6ad2fdd2b6b7
      zone: eu-central-1a
    - id: subnet-0032dcd161010436d
      zone: eu-central-1b
    - id: subnet-0351e0d4d38489baa
      zone: eu-central-1c
    - id: subnet-01242a5d0ee3e2ce2
      zone: eu-central-1a
kind: List
metadata:
  resourceVersion: ""

here are the latest logs of karpenter

{"level":"ERROR","time":"2023-11-28T07:49:13.575Z","logger":"controller.provisioner","message":"Could not schedule pod, incompatible with nodepool \"arm-instance-store\", daemonset overhead={\"cpu\":\"180m\",\"memory\":\"504Mi\",\"pods\":\"4\"}, did not tolerate arch=arm64:NoSchedule; incompatible with provisioner \"arm\", daemonset overhead={\"cpu\":\"205m\",\"memory\":\"674Mi\",\"pods\":\"7\"}, did not tolerate arch=arm64:NoSchedule; incompatible with nodepool \"x86-instance-store\", daemonset overhead={\"cpu\":\"180m\",\"memory\":\"504Mi\",\"pods\":\"4\"}, no instance type satisfied resources {\"cpu\":\"280m\",\"memory\":\"760Mi\",\"pods\":\"5\"} and requirements karpenter.k8s.aws/instance-category In [c hpc m r x and 1 others], karpenter.k8s.aws/instance-generation Exists >3, karpenter.k8s.aws/instance-hypervisor In [nitro], karpenter.k8s.aws/instance-local-nvme Exists >45, karpenter.sh/capacity-type In [on-demand spot], karpenter.sh/nodepool In [x86-instance-store], kubernetes.io/arch In [amd64], kubernetes.io/os In [linux] (no instance type has enough resources); incompatible with provisioner \"x86\", daemonset overhead={\"cpu\":\"205m\",\"memory\":\"674Mi\",\"pods\":\"7\"}, incompatible requirements, label \"karpenter.sh/nodepool\" does not have known values; all available instance types exceed limits for provisioner: \"cron\"","commit":"5eda5c1","pod":"default/testpod"}
{"level":"ERROR","time":"2023-11-28T07:49:22.662Z","logger":"controller.provisioner","message":"Could not schedule pod, incompatible with nodepool \"arm-instance-store\", daemonset overhead={\"cpu\":\"180m\",\"memory\":\"504Mi\",\"pods\":\"4\"}, did not tolerate arch=arm64:NoSchedule; incompatible with provisioner \"arm\", daemonset overhead={\"cpu\":\"205m\",\"memory\":\"674Mi\",\"pods\":\"7\"}, did not tolerate arch=arm64:NoSchedule; incompatible with nodepool \"x86-instance-store\", daemonset overhead={\"cpu\":\"180m\",\"memory\":\"504Mi\",\"pods\":\"4\"}, no instance type satisfied resources {\"cpu\":\"280m\",\"memory\":\"760Mi\",\"pods\":\"5\"} and requirements karpenter.k8s.aws/instance-category In [c hpc m r x and 1 others], karpenter.k8s.aws/instance-generation Exists >3, karpenter.k8s.aws/instance-hypervisor In [nitro], karpenter.k8s.aws/instance-local-nvme Exists >45, karpenter.sh/capacity-type In [on-demand spot], karpenter.sh/nodepool In [x86-instance-store], kubernetes.io/arch In [amd64], kubernetes.io/os In [linux] (no instance type has enough resources); incompatible with provisioner \"x86\", daemonset overhead={\"cpu\":\"205m\",\"memory\":\"674Mi\",\"pods\":\"7\"}, incompatible requirements, label \"karpenter.sh/nodepool\" does not have known values; all available instance types exceed limits for provisioner: \"cron\"","commit":"5eda5c1","pod":"default/testpod"}
{"level":"ERROR","time":"2023-11-28T07:49:32.582Z","logger":"controller.provisioner","message":"Could not schedule pod, incompatible with nodepool \"arm-instance-store\", daemonset overhead={\"cpu\":\"180m\",\"memory\":\"504Mi\",\"pods\":\"4\"}, did not tolerate arch=arm64:NoSchedule; incompatible with provisioner \"arm\", daemonset overhead={\"cpu\":\"205m\",\"memory\":\"674Mi\",\"pods\":\"7\"}, did not tolerate arch=arm64:NoSchedule; incompatible with nodepool \"x86-instance-store\", daemonset overhead={\"cpu\":\"180m\",\"memory\":\"504Mi\",\"pods\":\"4\"}, no instance type satisfied resources {\"cpu\":\"280m\",\"memory\":\"760Mi\",\"pods\":\"5\"} and requirements karpenter.k8s.aws/instance-category In [c hpc m r x and 1 others], karpenter.k8s.aws/instance-generation Exists >3, karpenter.k8s.aws/instance-hypervisor In [nitro], karpenter.k8s.aws/instance-local-nvme Exists >45, karpenter.sh/capacity-type In [on-demand spot], karpenter.sh/nodepool In [x86-instance-store], kubernetes.io/arch In [amd64], kubernetes.io/os In [linux] (no instance type has enough resources); incompatible with provisioner \"x86\", daemonset overhead={\"cpu\":\"205m\",\"memory\":\"674Mi\",\"pods\":\"7\"}, incompatible requirements, label \"karpenter.sh/nodepool\" does not have known values; all available instance types exceed limits for provisioner: \"cron\"","commit":"5eda5c1","pod":"default/testpod"}
{"level":"ERROR","time":"2023-11-28T07:49:42.665Z","logger":"controller.provisioner","message":"Could not schedule pod, incompatible with nodepool \"arm-instance-store\", daemonset overhead={\"cpu\":\"180m\",\"memory\":\"504Mi\",\"pods\":\"4\"}, did not tolerate arch=arm64:NoSchedule; incompatible with provisioner \"arm\", daemonset overhead={\"cpu\":\"205m\",\"memory\":\"674Mi\",\"pods\":\"7\"}, did not tolerate arch=arm64:NoSchedule; incompatible with nodepool \"x86-instance-store\", daemonset overhead={\"cpu\":\"180m\",\"memory\":\"504Mi\",\"pods\":\"4\"}, no instance type satisfied resources {\"cpu\":\"280m\",\"memory\":\"760Mi\",\"pods\":\"5\"} and requirements karpenter.k8s.aws/instance-category In [c hpc m r x and 1 others], karpenter.k8s.aws/instance-generation Exists >3, karpenter.k8s.aws/instance-hypervisor In [nitro], karpenter.k8s.aws/instance-local-nvme Exists >45, karpenter.sh/capacity-type In [on-demand spot], karpenter.sh/nodepool In [x86-instance-store], kubernetes.io/arch In [amd64], kubernetes.io/os In [linux] (no instance type has enough resources); incompatible with provisioner \"x86\", daemonset overhead={\"cpu\":\"205m\",\"memory\":\"674Mi\",\"pods\":\"7\"}, incompatible requirements, label \"karpenter.sh/nodepool\" does not have known values; all available instance types exceed limits for provisioner: \"cron\"","commit":"5eda5c1","pod":"default/testpod"}
{"level":"ERROR","time":"2023-11-28T07:49:53.270Z","logger":"controller.provisioner","message":"Could not schedule pod, incompatible with nodepool \"arm-instance-store\", daemonset overhead={\"cpu\":\"180m\",\"memory\":\"504Mi\",\"pods\":\"4\"}, did not tolerate arch=arm64:NoSchedule; incompatible with provisioner \"arm\", daemonset overhead={\"cpu\":\"205m\",\"memory\":\"674Mi\",\"pods\":\"7\"}, did not tolerate arch=arm64:NoSchedule; incompatible with nodepool \"x86-instance-store\", daemonset overhead={\"cpu\":\"180m\",\"memory\":\"504Mi\",\"pods\":\"4\"}, no instance type satisfied resources {\"cpu\":\"280m\",\"memory\":\"760Mi\",\"pods\":\"5\"} and requirements karpenter.k8s.aws/instance-category In [c hpc m r x and 1 others], karpenter.k8s.aws/instance-generation Exists >3, karpenter.k8s.aws/instance-hypervisor In [nitro], karpenter.k8s.aws/instance-local-nvme Exists >45, karpenter.sh/capacity-type In [on-demand spot], karpenter.sh/nodepool In [x86-instance-store], kubernetes.io/arch In [amd64], kubernetes.io/os In [linux] (no instance type has enough resources); incompatible with provisioner \"x86\", daemonset overhead={\"cpu\":\"205m\",\"memory\":\"674Mi\",\"pods\":\"7\"}, incompatible requirements, label \"karpenter.sh/nodepool\" does not have known values; all available instance types exceed limits for provisioner: \"cron\"","commit":"5eda5c1","pod":"default/testpod"}
{"level":"ERROR","time":"2023-11-28T07:50:03.074Z","logger":"controller.provisioner","message":"Could not schedule pod, incompatible with nodepool \"arm-instance-store\", daemonset overhead={\"cpu\":\"180m\",\"memory\":\"504Mi\",\"pods\":\"4\"}, did not tolerate arch=arm64:NoSchedule; incompatible with provisioner \"arm\", daemonset overhead={\"cpu\":\"205m\",\"memory\":\"674Mi\",\"pods\":\"7\"}, did not tolerate arch=arm64:NoSchedule; incompatible with nodepool \"x86-instance-store\", daemonset overhead={\"cpu\":\"180m\",\"memory\":\"504Mi\",\"pods\":\"4\"}, no instance type satisfied resources {\"cpu\":\"280m\",\"memory\":\"760Mi\",\"pods\":\"5\"} and requirements karpenter.k8s.aws/instance-category In [c hpc m r x and 1 others], karpenter.k8s.aws/instance-generation Exists >3, karpenter.k8s.aws/instance-hypervisor In [nitro], karpenter.k8s.aws/instance-local-nvme Exists >45, karpenter.sh/capacity-type In [on-demand spot], karpenter.sh/nodepool In [x86-instance-store], kubernetes.io/arch In [amd64], kubernetes.io/os In [linux] (no instance type has enough resources); incompatible with provisioner \"x86\", daemonset overhead={\"cpu\":\"205m\",\"memory\":\"674Mi\",\"pods\":\"7\"}, incompatible requirements, label \"karpenter.sh/nodepool\" does not have known values; all available instance types exceed limits for provisioner: \"cron\"","commit":"5eda5c1","pod":"default/testpod"}

Here are the nodepools:

$ kubectl get nodepool -o yaml
apiVersion: v1
items:
- apiVersion: karpenter.sh/v1beta1
  kind: NodePool
  metadata:
    annotations:
      karpenter.sh/nodepool-hash: "14551328993050931519"
    creationTimestamp: "2023-11-27T10:20:30Z"
    generation: 1
    labels:
      kustomize.toolkit.fluxcd.io/name: karpenter-custom-provisioner
      kustomize.toolkit.fluxcd.io/namespace: flux-system
    name: arm-instance-store
    resourceVersion: "868412088"
    uid: ff78546d-28a9-452e-b7c0-2cc5cfb224b2
  spec:
    disruption:
      consolidationPolicy: WhenUnderutilized
      expireAfter: 604800s
    limits:
      cpu: 100
    template:
      metadata: {}
      spec:
        kubelet:
          evictionHard:
            memory.available: 0.2Gi
            nodefs.available: 10%
            nodefs.inodesFree: 10%
          evictionMaxPodGracePeriod: 180
          evictionSoft:
            memory.available: 500Mi
            nodefs.available: 15%
            nodefs.inodesFree: 15%
          evictionSoftGracePeriod:
            memory.available: 3m0s
            nodefs.available: 3m0s
            nodefs.inodesFree: 3m0s
          kubeReserved:
            cpu: 250m
            ephemeral-storage: 1Gi
            memory: 200Mi
          systemReserved:
            cpu: 250m
            ephemeral-storage: 1Gi
            memory: 200Mi
        nodeClassRef:
          name: bottlerocket-instance-store
        requirements:
        - key: karpenter.sh/capacity-type
          operator: In
          values:
          - on-demand
          - spot
        - key: kubernetes.io/arch
          operator: In
          values:
          - arm64
        - key: karpenter.k8s.aws/instance-hypervisor
          operator: In
          values:
          - nitro
        - key: karpenter.k8s.aws/instance-local-nvme
          operator: Gt
          values:
          - "45"
        - key: kubernetes.io/os
          operator: In
          values:
          - linux
        taints:
        - effect: NoSchedule
          key: arch
          value: arm64
        - effect: NoSchedule
          key: instance-store
          value: "true"
    weight: 30
- apiVersion: karpenter.sh/v1beta1
  kind: NodePool
  metadata:
    annotations:
      karpenter.sh/nodepool-hash: "14918018216109419487"
    creationTimestamp: "2023-11-27T10:20:30Z"
    generation: 1
    labels:
      kustomize.toolkit.fluxcd.io/name: karpenter-custom-provisioner
      kustomize.toolkit.fluxcd.io/namespace: flux-system
    name: x86-instance-store
    resourceVersion: "868561144"
    uid: faea5b2c-85a0-476d-aee3-de19b0df14cc
  spec:
    disruption:
      consolidationPolicy: WhenUnderutilized
      expireAfter: 604800s
    limits:
      cpu: 100
    template:
      metadata: {}
      spec:
        kubelet:
          evictionHard:
            memory.available: 0.2Gi
            nodefs.available: 10%
            nodefs.inodesFree: 10%
          evictionMaxPodGracePeriod: 180
          evictionSoft:
            memory.available: 500Mi
            nodefs.available: 15%
            nodefs.inodesFree: 15%
          evictionSoftGracePeriod:
            memory.available: 3m0s
            nodefs.available: 3m0s
            nodefs.inodesFree: 3m0s
          kubeReserved:
            cpu: 250m
            ephemeral-storage: 1Gi
            memory: 200Mi
          systemReserved:
            cpu: 250m
            ephemeral-storage: 1Gi
            memory: 200Mi
        nodeClassRef:
          name: bottlerocket-instance-store
        requirements:
        - key: karpenter.sh/capacity-type
          operator: In
          values:
          - on-demand
          - spot
        - key: kubernetes.io/arch
          operator: In
          values:
          - amd64
        - key: karpenter.k8s.aws/instance-category
          operator: In
          values:
          - c
          - m
          - r
          - x
          - z
          - hpc
        - key: karpenter.k8s.aws/instance-generation
          operator: Gt
          values:
          - "3"
        - key: karpenter.k8s.aws/instance-hypervisor
          operator: In
          values:
          - nitro
        - key: karpenter.k8s.aws/instance-local-nvme
          operator: Gt
          values:
          - "45"
        - key: kubernetes.io/os
          operator: In
          values:
          - linux
        taints:
        - effect: NoSchedule
          key: instance-store
          value: "true"
    weight: 20
  status: {}
kind: List
metadata:
  resourceVersion: ""

One side node, this cluster still has two normal provisioners running serving the normal workload.

$ kubectl get provisioner -o yaml
apiVersion: v1
items:
- apiVersion: karpenter.sh/v1alpha5
  kind: Provisioner
  metadata:
    annotations:
      karpenter.sh/provisioner-hash: "12353849731795830819"
    creationTimestamp: "2023-06-06T08:52:30Z"
    generation: 5
    labels:
      kustomize.toolkit.fluxcd.io/name: karpenter-crds
      kustomize.toolkit.fluxcd.io/namespace: flux-system
    name: arm
    resourceVersion: "868901858"
    uid: 83dfcfec-4a43-4c5d-a9a5-b7ce66a4a9d8
  spec:
    consolidation:
      enabled: true
    kubeletConfiguration:
      evictionHard:
        memory.available: 0.2Gi
        nodefs.available: 10%
        nodefs.inodesFree: 10%
      evictionMaxPodGracePeriod: 180
      evictionSoft:
        memory.available: 500Mi
        nodefs.available: 15%
        nodefs.inodesFree: 15%
      evictionSoftGracePeriod:
        memory.available: 3m0s
        nodefs.available: 3m0s
        nodefs.inodesFree: 3m0s
      kubeReserved:
        cpu: 250m
        ephemeral-storage: 1Gi
        memory: 200Mi
      systemReserved:
        cpu: 250m
        ephemeral-storage: 1Gi
        memory: 200Mi
    limits:
      resources:
        cpu: "100"
    providerRef:
      name: bottlerocket
    requirements:
    - key: karpenter.sh/capacity-type
      operator: In
      values:
      - on-demand
      - spot
    - key: kubernetes.io/arch
      operator: In
      values:
      - arm64
    - key: karpenter.k8s.aws/instance-hypervisor
      operator: In
      values:
      - nitro
    - key: kubernetes.io/os
      operator: In
      values:
      - linux
    - key: karpenter.k8s.aws/instance-category
      operator: In
      values:
      - c
      - m
      - r
    - key: karpenter.k8s.aws/instance-generation
      operator: Gt
      values:
      - "2"
    taints:
    - effect: NoSchedule
      key: arch
      value: arm64
    ttlSecondsUntilExpired: 86400
    weight: 30
  status:
    resources:
      cpu: "5"
      ephemeral-storage: 307002Mi
      memory: 23871964Ki
      pods: "66"
- apiVersion: karpenter.sh/v1alpha5
  kind: Provisioner
  metadata:
    annotations:
      karpenter.sh/provisioner-hash: "6229279867979954262"
    creationTimestamp: "2022-12-19T17:01:38Z"
    generation: 18
    labels:
      kustomize.toolkit.fluxcd.io/name: karpenter-custom-provisioner
      kustomize.toolkit.fluxcd.io/namespace: flux-system
    name: cron
    resourceVersion: "869149675"
    uid: 42ec341d-9f5e-4afe-a796-fd49d4fc1c28
  spec:
    kubeletConfiguration:
      evictionHard:
        memory.available: 0.2Gi
        nodefs.available: 10%
        nodefs.inodesFree: 10%
      evictionMaxPodGracePeriod: 180
      evictionSoft:
        memory.available: 500Mi
        nodefs.available: 15%
        nodefs.inodesFree: 15%
      evictionSoftGracePeriod:
        memory.available: 3m0s
        nodefs.available: 3m0s
        nodefs.inodesFree: 3m0s
      kubeReserved:
        cpu: 250m
        ephemeral-storage: 1Gi
        memory: 200Mi
      systemReserved:
        cpu: 250m
        ephemeral-storage: 1Gi
        memory: 200Mi
    limits:
      resources:
        cpu: "4"
    providerRef:
      name: bottlerocket
    requirements:
    - key: karpenter.sh/capacity-type
      operator: In
      values:
      - on-demand
      - spot
    - key: kubernetes.io/arch
      operator: In
      values:
      - amd64
    - key: karpenter.k8s.aws/instance-category
      operator: In
      values:
      - c
      - m
      - r
      - t
    - key: karpenter.k8s.aws/instance-hypervisor
      operator: In
      values:
      - nitro
    - key: kubernetes.io/os
      operator: In
      values:
      - linux
    taints:
    - effect: NoSchedule
      key: provisioner
      value: cron
    ttlSecondsAfterEmpty: 900
    ttlSecondsUntilExpired: 7200
    weight: 10
  status:
    resources:
      cpu: "4"
      ephemeral-storage: 204668Mi
      memory: 3860088Ki
      pods: "22"
- apiVersion: karpenter.sh/v1alpha5
  kind: Provisioner
  metadata:
    annotations:
      karpenter.sh/provisioner-hash: "4436524457269508529"
    creationTimestamp: "2023-06-06T08:50:12Z"
    generation: 6
    labels:
      kustomize.toolkit.fluxcd.io/name: karpenter-crds
      kustomize.toolkit.fluxcd.io/namespace: flux-system
    name: x86
    resourceVersion: "869076595"
    uid: e0e787dd-61de-4cf7-a758-46676d0f0551
  spec:
    consolidation:
      enabled: true
    kubeletConfiguration:
      evictionHard:
        memory.available: 0.2Gi
        nodefs.available: 10%
        nodefs.inodesFree: 10%
      evictionMaxPodGracePeriod: 180
      evictionSoft:
        memory.available: 500Mi
        nodefs.available: 15%
        nodefs.inodesFree: 15%
      evictionSoftGracePeriod:
        memory.available: 3m0s
        nodefs.available: 3m0s
        nodefs.inodesFree: 3m0s
      kubeReserved:
        cpu: 250m
        ephemeral-storage: 1Gi
        memory: 200Mi
      systemReserved:
        cpu: 250m
        ephemeral-storage: 1Gi
        memory: 200Mi
    limits:
      resources:
        cpu: "100"
    providerRef:
      name: bottlerocket
    requirements:
    - key: karpenter.sh/capacity-type
      operator: In
      values:
      - on-demand
      - spot
    - key: kubernetes.io/arch
      operator: In
      values:
      - amd64
    - key: karpenter.k8s.aws/instance-category
      operator: In
      values:
      - c
      - m
      - r
      - x
      - z
      - hpc
    - key: karpenter.k8s.aws/instance-generation
      operator: Gt
      values:
      - "3"
    - key: karpenter.k8s.aws/instance-hypervisor
      operator: In
      values:
      - nitro
    - key: karpenter.k8s.aws/instance-cpu
      operator: In
      values:
      - "2"
      - "4"
    - key: karpenter.k8s.aws/instance-local-nvme
      operator: DoesNotExist
    - key: kubernetes.io/os
      operator: In
      values:
      - linux
    ttlSecondsUntilExpired: 86400
    weight: 20
  status:
    resources:
      cpu: "8"
      ephemeral-storage: 409336Mi
      memory: 27635924Ki
      pods: "116"
kind: List
metadata:
  resourceVersion: ""

The provisioners will be migrated to nodepool once we have our nodepool test work out fine.

runningman84 commented 11 months ago

I have reproduced the same issue on another cluster with these files:

runningman84 commented 11 months ago

Ok I have found the underlying issue!

If you specify a block device mapping with storage sizes below the default block device mapping you get an error.

The problem is that error message is misleading: (no instance type has enough resources)

Even in the debug logs you do not get the real cause for this issue.

AMI ID: ami-095132eeb54fa060c

Block devices /dev/xvda=snap-0e9f680174aca839c:2:true:gp2 /dev/xvdb=snap-06ae575923377eefb:20:true:gp2

This issue is also linked to https://github.com/aws/karpenter/issues/5180

jmdeal commented 11 months ago

Great to hear that you were able to root cause this, I'll try and reproduce the issue with your nodeclass and see if there's anything we can do to improve the error message.

github-actions[bot] commented 11 months ago

This issue has been inactive for 14 days. StaleBot will close this stale issue after 14 more days of inactivity.

runningman84 commented 10 months ago

@jmdeal the bot which auto closes issues does not encourage anybody to report additional bug reports. If these issues are auto closed within 2 weeks how are you going to fix any complex issue?

jmdeal commented 10 months ago

This issue closing is a bit of a miss on my part, I should have removed the question label and replaced it with feature. I didn't have a chance to take a closer look at this before but it seems like it's not so much a bug but a log enhancement but let me know if I'm mistaken.

Generally we auto-close issues with the question label because that indicates that we're waiting on further information. Often this is to determine if the issue is a bug in the first place, rather than a misconfiguration. Once we determine something is a bug the bug label should be re-added and the bot won't close the issue.