Report ephemeral disk requirements during scale-up esp. when the 20Gi default is applied

thelabdude commented 1 year ago

Version

Karpenter Version: v0.27.1

Kubernetes Version: v1.23.16-eks-48e63af

Expected Behavior

Karpenter is initially over-allocating instances (poor fit to pending pods) for a deployment that prefers spot instances; the behavior for on-demand seems correct (or at least a better initial fit). Consolidation gets the fit right but this leads to unnecessary pod evictions very soon after the pods start. My sense tells me there's just something off with Karpenter's initial calculations with spot here? idk ...

Actual Behavior

Doing some basic comparisons of behavior between spot and on-demand using a simple deployment. With spot, I'm seeing Karpenter spin up way too many instances and then consolidating them down to a right-sized instance in a second pass (see logs).

When using on-demand, Karpenter seems to do a better fit initially. The main difference between my spot and on-demand configuration are I'm using d instances for spot (e.g. r6id) but non-d for on-demand (EBS only instance types like r6i). My logic here is since Karpenter currently forces me to have a one-size fits all EBS vol for all instance sizes (https://github.com/aws/karpenter/issues/2723), then I'll take the price hit with d instance spots vs. EBS only spots. That's probably irrelevant to this issue but wanted to mention just in case.

When I initially submit the deployment that prefers spot, I see these instances being started by Karpenter:

ip-xxx-yy-130-46.us-west-2.compute.internal    Unknown   <none>   16s                                                                                      r6id.large      SPOT
ip-xxx-yy-141-140.us-west-2.compute.internal   Unknown   <none>   16s                                                                                      r6id.xlarge     SPOT
ip-xxx-yy-150-114.us-west-2.compute.internal   Unknown   <none>   16s                                                                                      m6id.xlarge     SPOT
ip-xxx-yy-178-172.us-west-2.compute.internal   Unknown   <none>   16s                                                                                      c6id.2xlarge    SPOT
ip-xxx-yy-181-226.us-west-2.compute.internal   Unknown   <none>   16s                                                                                      r6id.xlarge     SPOT
ip-xxx-yy-190-43.us-west-2.compute.internal    Unknown   <none>   16s                                                                                      m6id.xlarge     SPOT
ip-xxx-yy-191-26.us-west-2.compute.internal    Unknown   <none>   16s                                                                                      r6id.large      SPOT

After about 2 minutes, Karpenter consolidates down to a single node (see logs below showing this activity):

ip-xxx-yy-181-226.us-west-2.compute.internal   Ready    <none>   16m     v1.23.15-eks-49d8fe8                                                us-west-2c   r6id.xlarge     SPOT

Steps to Reproduce the Problem

Here's my simple test deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: test-karp-spot
spec:
  selector:
    matchLabels:
      run: test-karp-spot
  replicas: 50
  template:
    metadata:
      labels:
        run: test-karp-spot
        spot: "preferred"
    spec:
      containers:
        - name: php-apache
          image: registry.k8s.io/hpa-example
          imagePullPolicy: IfNotPresent
          ports:
            - containerPort: 8080
          resources:
            requests:
              cpu: 82m
              memory: 335Mi
              ephemeral-storage: 5Gi

Note that I mutate the pod using OPA to add:

  affinity:
    nodeAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - preference:
          matchExpressions:
          - key: eks.amazonaws.com/capacityType
            operator: In
            values:
            - SPOT
        weight: 1

And a toleration for a taint my AWSNodeTemplate adds to the spot nodes.

Resource Specs and Logs

Here's the provisioner for spot:

apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
  name: karp-spot
spec:
  consolidation:
    enabled: true
  labels:
    eks.amazonaws.com/capacityType: SPOT
  providerRef:
    name: karp-spot-nt
  requirements:
  - key: karpenter.k8s.aws/instance-family
    operator: In
    values:
    - r6id
    - m6id
    - c6id
  - key: karpenter.k8s.aws/instance-size
    operator: In
    values:
    - large
    - xlarge
    - 2xlarge
    - 4xlarge
    - 8xlarge
    - 12xlarge
    - 16xlarge
    - 24xlarge
  - key: karpenter.sh/capacity-type
    operator: In
    values:
    - spot
  - key: kubernetes.io/arch
    operator: In
    values:
    - amd64
  - key: kubernetes.io/os
    operator: In
    values:
    - linux
  taints:
  - effect: NoSchedule
    key: tolerates-spot
    value: "true"
  weight: 50
status:
  resources:
    attachable-volumes-aws-ebs: "39"
    cpu: "4"
    ephemeral-storage: 231332304Ki
    memory: 32351440Ki
    pods: "58"
    vpc.amazonaws.com/pod-eni: "18"

here's the relevant log section showing the activity when the deployment is added to the cluster:

2023-03-31T18:20:38.857Z    INFO    controller.provisioner  found provisionable pod(s)  {"commit": "7131be2-dirty", "pods": 21}
2023-03-31T18:20:38.857Z    INFO    controller.provisioner  computed new machine(s) to fit pod(s)   {"commit": "7131be2-dirty", "machines": 7, "pods": 21}
2023-03-31T18:20:38.857Z    INFO    controller.provisioner  launching machine with 3 pods requesting {"cpu":"526m","ephemeral-storage":"15Gi","memory":"1125Mi","pods":"7","vpc.amazonaws.com/pod-eni":"1"} from types c6id.8xlarge, m6id.2xlarge, r6id.xlarge, r6id.2xlarge, r6id.16xlarge and 19 other(s) {"commit": "7131be2-dirty", "provisioner": "karp-spot"}
2023-03-31T18:20:38.866Z    INFO    controller.provisioner  launching machine with 3 pods requesting {"cpu":"526m","ephemeral-storage":"15Gi","memory":"1125Mi","pods":"7","vpc.amazonaws.com/pod-eni":"1"} from types c6id.8xlarge, m6id.2xlarge, r6id.xlarge, r6id.2xlarge, r6id.16xlarge and 19 other(s) {"commit": "7131be2-dirty", "provisioner": "karp-spot"}
2023-03-31T18:20:38.876Z    INFO    controller.provisioner  launching machine with 3 pods requesting {"cpu":"526m","ephemeral-storage":"15Gi","memory":"1125Mi","pods":"7","vpc.amazonaws.com/pod-eni":"1"} from types c6id.8xlarge, m6id.2xlarge, r6id.xlarge, r6id.2xlarge, r6id.16xlarge and 19 other(s) {"commit": "7131be2-dirty", "provisioner": "karp-spot"}
2023-03-31T18:20:38.887Z    INFO    controller.provisioner  launching machine with 3 pods requesting {"cpu":"526m","ephemeral-storage":"15Gi","memory":"1125Mi","pods":"7","vpc.amazonaws.com/pod-eni":"1"} from types c6id.8xlarge, m6id.2xlarge, r6id.xlarge, r6id.2xlarge, r6id.16xlarge and 19 other(s) {"commit": "7131be2-dirty", "provisioner": "karp-spot"}
2023-03-31T18:20:38.897Z    INFO    controller.provisioner  launching machine with 3 pods requesting {"cpu":"526m","ephemeral-storage":"15Gi","memory":"1125Mi","pods":"7","vpc.amazonaws.com/pod-eni":"1"} from types c6id.8xlarge, m6id.2xlarge, r6id.xlarge, r6id.2xlarge, r6id.16xlarge and 19 other(s) {"commit": "7131be2-dirty", "provisioner": "karp-spot"}
2023-03-31T18:20:38.908Z    INFO    controller.provisioner  launching machine with 3 pods requesting {"cpu":"526m","ephemeral-storage":"15Gi","memory":"1125Mi","pods":"7","vpc.amazonaws.com/pod-eni":"1"} from types c6id.8xlarge, m6id.2xlarge, r6id.xlarge, r6id.2xlarge, r6id.16xlarge and 19 other(s) {"commit": "7131be2-dirty", "provisioner": "karp-spot"}
2023-03-31T18:20:38.918Z    INFO    controller.provisioner  launching machine with 3 pods requesting {"cpu":"526m","ephemeral-storage":"15Gi","memory":"1125Mi","pods":"7","vpc.amazonaws.com/pod-eni":"1"} from types c6id.8xlarge, m6id.2xlarge, r6id.xlarge, r6id.2xlarge, r6id.16xlarge and 19 other(s) {"commit": "7131be2-dirty", "provisioner": "karp-spot"}
2023-03-31T18:20:49.978Z    INFO    controller.provisioner.cloudprovider    launched new instance   {"commit": "7131be2-dirty", "provisioner": "karp-spot", "id": "i-0e03e7ad1771b97d8", "hostname": "ip-xxx-yy-150-114.us-west-2.compute.internal", "instance-type": "m6id.xlarge", "zone": "us-west-2a", "capacity-type": "spot"}
2023-03-31T18:20:49.978Z    INFO    controller.provisioner.cloudprovider    launched new instance   {"commit": "7131be2-dirty", "provisioner": "karp-spot", "id": "i-02c351909bdfcdcaa", "hostname": "ip-xxx-yy-130-46.us-west-2.compute.internal", "instance-type": "r6id.large", "zone": "us-west-2a", "capacity-type": "spot"}
2023-03-31T18:20:49.978Z    INFO    controller.provisioner.cloudprovider    launched new instance   {"commit": "7131be2-dirty", "provisioner": "karp-spot", "id": "i-0e5a84d54407e1b94", "hostname": "ip-xxx-yy-191-26.us-west-2.compute.internal", "instance-type": "r6id.large", "zone": "us-west-2c", "capacity-type": "spot"}
2023-03-31T18:20:49.978Z    INFO    controller.provisioner.cloudprovider    launched new instance   {"commit": "7131be2-dirty", "provisioner": "karp-spot", "id": "i-0c42be627f1bd3aa7", "hostname": "ip-xxx-yy-181-226.us-west-2.compute.internal", "instance-type": "r6id.xlarge", "zone": "us-west-2c", "capacity-type": "spot"}
2023-03-31T18:20:49.978Z    INFO    controller.provisioner.cloudprovider    launched new instance   {"commit": "7131be2-dirty", "provisioner": "karp-spot", "id": "i-0bbd69a58982fecbd", "hostname": "ip-xxx-yy-178-172.us-west-2.compute.internal", "instance-type": "c6id.2xlarge", "zone": "us-west-2c", "capacity-type": "spot"}
2023-03-31T18:20:49.978Z    INFO    controller.provisioner.cloudprovider    launched new instance   {"commit": "7131be2-dirty", "provisioner": "karp-spot", "id": "i-0962c40b890b90ec3", "hostname": "ip-xxx-yy-190-43.us-west-2.compute.internal", "instance-type": "m6id.xlarge", "zone": "us-west-2c", "capacity-type": "spot"}
2023-03-31T18:20:49.979Z    INFO    controller.provisioner.cloudprovider    launched new instance   {"commit": "7131be2-dirty", "provisioner": "karp-spot", "id": "i-0521b7e74a47d4262", "hostname": "ip-xxx-yy-141-140.us-west-2.compute.internal", "instance-type": "r6id.xlarge", "zone": "us-west-2a", "capacity-type": "spot"}

2023-03-31T18:22:10.853Z    INFO    controller.deprovisioning   deprovisioning via consolidation delete, terminating 5 machines ip-xxx-yy-178-172.us-west-2.compute.internal/c6id.2xlarge/spot, ip-xxx-yy-150-114.us-west-2.compute.internal/m6id.xlarge/spot, ip-xxx-yy-130-46.us-west-2.compute.internal/r6id.large/spot, ip-xxx-yy-141-140.us-west-2.compute.internal/r6id.xlarge/spot, ip-xxx-yy-190-43.us-west-2.compute.internal/m6id.xlarge/spot {"commit": "7131be2-dirty"}
2023-03-31T18:22:10.866Z    INFO    controller.termination  cordoned node   {"commit": "7131be2-dirty", "node": "ip-xxx-yy-178-172.us-west-2.compute.internal"}
2023-03-31T18:22:10.875Z    INFO    controller.termination  cordoned node   {"commit": "7131be2-dirty", "node": "ip-xxx-yy-150-114.us-west-2.compute.internal"}
2023-03-31T18:22:10.883Z    INFO    controller.termination  cordoned node   {"commit": "7131be2-dirty", "node": "ip-xxx-yy-130-46.us-west-2.compute.internal"}
2023-03-31T18:22:10.895Z    INFO    controller.termination  cordoned node   {"commit": "7131be2-dirty", "node": "ip-xxx-yy-141-140.us-west-2.compute.internal"}
2023-03-31T18:22:10.908Z    INFO    controller.termination  cordoned node   {"commit": "7131be2-dirty", "node": "ip-xxx-yy-190-43.us-west-2.compute.internal"}
2023-03-31T18:22:11.457Z    INFO    controller.termination  deleted node    {"commit": "7131be2-dirty", "node": "ip-xxx-yy-130-46.us-west-2.compute.internal"}
2023-03-31T18:22:11.457Z    INFO    controller.termination  deleted node    {"commit": "7131be2-dirty", "node": "ip-xxx-yy-150-114.us-west-2.compute.internal"}
2023-03-31T18:22:11.458Z    INFO    controller.termination  deleted node    {"commit": "7131be2-dirty", "node": "ip-xxx-yy-141-140.us-west-2.compute.internal"}
2023-03-31T18:22:11.460Z    INFO    controller.termination  deleted node    {"commit": "7131be2-dirty", "node": "ip-xxx-yy-178-172.us-west-2.compute.internal"}
2023-03-31T18:22:11.461Z    INFO    controller.termination  deleted node    {"commit": "7131be2-dirty", "node": "ip-xxx-yy-190-43.us-west-2.compute.internal"}
2023-03-31T18:22:28.128Z    INFO    controller.deprovisioning   deprovisioning via consolidation delete, terminating 1 machines ip-xxx-yy-191-26.us-west-2.compute.internal/r6id.large/spot {"commit": "7131be2-dirty"}
2023-03-31T18:22:28.140Z    INFO    controller.termination  cordoned node   {"commit": "7131be2-dirty", "node": "ip-xxx-yy-191-26.us-west-2.compute.internal"}
2023-03-31T18:22:40.555Z    INFO    controller.termination  deleted node    {"commit": "7131be2-dirty", "node": "ip-xxx-yy-191-26.us-west-2.compute.internal"}
2023/03/31 18:25:50 http: TLS handshake error from 240.56.190.209:60408: read tcp 240.56.149.62:8443->240.56.190.209:60408: read: connection reset by peer

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

jonathan-innis commented 1 year ago

Can you do a describe on one of the nodes that launched so that we can look at the resources that the node has after launch

thelabdude commented 1 year ago

Yes, we have a few daemonsets on these nodes but they are tiny. Here's one of the nodes that came up:

Name:               ip-xxx-yy-187-131.us-west-2.compute.internal
Roles:              <none>
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/instance-type=m6id.xlarge
                    beta.kubernetes.io/os=linux
                    eks.amazonaws.com/capacityType=SPOT
                    failure-domain.beta.kubernetes.io/region=us-west-2
                    failure-domain.beta.kubernetes.io/zone=us-west-2c
                    k8s.io/cloud-provider-aws=96a469b634ca7e71303fa61fa2302c91
                    karpenter.k8s.aws/instance-ami-id=ami-0173eacf6deadbace
                    karpenter.k8s.aws/instance-category=m
                    karpenter.k8s.aws/instance-cpu=4
                    karpenter.k8s.aws/instance-encryption-in-transit-supported=true
                    karpenter.k8s.aws/instance-family=m6id
                    karpenter.k8s.aws/instance-generation=6
                    karpenter.k8s.aws/instance-hypervisor=nitro
                    karpenter.k8s.aws/instance-local-nvme=237
                    karpenter.k8s.aws/instance-memory=16384
                    karpenter.k8s.aws/instance-network-bandwidth=1562
                    karpenter.k8s.aws/instance-pods=58
                    karpenter.k8s.aws/instance-size=xlarge
                    karpenter.sh/capacity-type=spot
                    karpenter.sh/initialized=true
                    karpenter.sh/provisioner-name=karp-spot
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=ip-xxx-yy-187-131.us-west-2.compute.internal
                    kubernetes.io/os=linux
                    node.kubernetes.io/instance-type=m6id.xlarge
                    topology.kubernetes.io/region=us-west-2
                    topology.kubernetes.io/zone=us-west-2c
                    vpc.amazonaws.com/has-trunk-attached=false
Annotations:        alpha.kubernetes.io/provided-node-ip: xxx.yy.187.131
                    node.alpha.kubernetes.io/ttl: 0
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Fri, 31 Mar 2023 13:01:25 -0600
Taints:             tolerates-spot=true:NoSchedule
Unschedulable:      false
Lease:
  HolderIdentity:  ip-xxx-yy-187-131.us-west-2.compute.internal
  AcquireTime:     <unset>
  RenewTime:       Fri, 31 Mar 2023 13:02:35 -0600
Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  MemoryPressure   False   Fri, 31 Mar 2023 13:02:25 -0600   Fri, 31 Mar 2023 13:02:14 -0600   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     False   Fri, 31 Mar 2023 13:02:25 -0600   Fri, 31 Mar 2023 13:02:14 -0600   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure      False   Fri, 31 Mar 2023 13:02:25 -0600   Fri, 31 Mar 2023 13:02:14 -0600   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready            True    Fri, 31 Mar 2023 13:02:25 -0600   Fri, 31 Mar 2023 13:02:25 -0600   KubeletReady                 kubelet is posting ready status
Addresses:
  InternalIP:   xxx.yy.187.131
  Hostname:     ip-xxx-yy-187-131.us-west-2.compute.internal
  InternalDNS:  ip-xxx-yy-187-131.us-west-2.compute.internal
Capacity:
  attachable-volumes-aws-ebs:  39
  cpu:                         4
  ephemeral-storage:           231332304Ki
  hugepages-1Gi:               0
  hugepages-2Mi:               0
  memory:                      16119004Ki
  pods:                        58
  vpc.amazonaws.com/pod-eni:   18
Allocatable:
  attachable-volumes-aws-ebs:  39
  cpu:                         3920m
  ephemeral-storage:           212122109190
  hugepages-1Gi:               0
  hugepages-2Mi:               0
  memory:                      15102172Ki
  pods:                        58
  vpc.amazonaws.com/pod-eni:   18
System Info:
  Machine ID:                 ec28d4981141e106d3637450a82dc2bc
  System UUID:                ec20e910-20e8-51dc-e75d-f9d759bbd4cb
  Boot ID:                    5ca81981-a600-4a67-b95f-f02e3c2560fb
  Kernel Version:             5.4.228-132.418.amzn2.x86_64
  OS Image:                   Amazon Linux 2
  Operating System:           linux
  Architecture:               amd64
  Container Runtime Version:  containerd://1.6.6
  Kubelet Version:            v1.23.15-eks-49d8fe8
  Kube-Proxy Version:         v1.23.15-eks-49d8fe8
ProviderID:                   aws:///us-west-2c/i-0435cb3cb4c7effbd
Non-terminated Pods:          (4 in total)
  Namespace                   Name                                  CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----                                  ------------  ----------  ---------------  -------------  ---
  kube-system                 aws-node-wmkgv                        30m (0%)      0 (0%)      0 (0%)           0 (0%)         76s
  kube-system                 ebs-csi-node-ksjf8                    150m (3%)     300m (7%)   120Mi (0%)       768Mi (5%)     77s
  kube-system                 kube-proxy-c5q9g                      100m (2%)     0 (0%)      0 (0%)           0 (0%)         77s
  prometheus-stack            mon-prometheus-node-exporter-978sf    0 (0%)        0 (0%)      0 (0%)           0 (0%)         76s
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource                    Requests    Limits
  --------                    --------    ------
  cpu                         280m (7%)   300m (7%)
  memory                      120Mi (0%)  768Mi (5%)
  ephemeral-storage           0 (0%)      0 (0%)
  hugepages-1Gi               0 (0%)      0 (0%)
  hugepages-2Mi               0 (0%)      0 (0%)
  attachable-volumes-aws-ebs  0           0
  vpc.amazonaws.com/pod-eni   1           1
Events:
  Type     Reason                   Age                From             Message
  ----     ------                   ----               ----             -------
  Normal   NodeAccepted             76s                yunikorn         node ip-xxx-yy-187-131.us-west-2.compute.internal is accepted by the scheduler
  Normal   RegisteredNode           73s                node-controller  Node ip-xxx-yy-187-131.us-west-2.compute.internal event: Registered Node ip-xxx-yy-187-131.us-west-2.compute.internal in Controller
  Normal   Starting                 28s                kubelet          Starting kubelet.
  Warning  InvalidDiskCapacity      28s                kubelet          invalid capacity 0 on image filesystem
  Normal   NodeHasSufficientMemory  28s (x3 over 28s)  kubelet          Node ip-xxx-yy-187-131.us-west-2.compute.internal status is now: NodeHasSufficientMemory
  Normal   NodeHasNoDiskPressure    28s (x3 over 28s)  kubelet          Node ip-xxx-yy-187-131.us-west-2.compute.internal status is now: NodeHasNoDiskPressure
  Normal   NodeHasSufficientPID     28s (x3 over 28s)  kubelet          Node ip-xxx-yy-187-131.us-west-2.compute.internal status is now: NodeHasSufficientPID
  Normal   NodeAllocatableEnforced  28s                kubelet          Updated Node Allocatable limit across pods
  Normal   Starting                 24s                kube-proxy       Starting kube-proxy.
  Normal   NodeReady                17s                kubelet          Node ip-xxx-yy-187-131.us-west-2.compute.internal status is now: NodeReady

jonathan-innis commented 1 year ago

What does your AWSNodeTemplate look like?

thelabdude commented 1 year ago

The only differences in the AWSNodeTemplate between spot and on-demand are 1) the spot one has a script in userData that mounts the ephemeral disk and 2) the on-demand declares a blockDeviceMapping for mounting an EBS vol

I removed the ephemeral-storage: 5Gi from my deployment spec and now Karpenter is only allocating a single r6id.xlarge instance initially. So is there anything I need to specify to tell Karpenter I'm using the ephemeral disk for ephemeral storage?

jonathan-innis commented 1 year ago

the spot one has a script in userData that mounts the ephemeral disk

Are you mounting an ephemeral-storage disk that Karpenter is unaware of? What's most likely happening here is that Karpenter assumes your ephemeral-storage is 20Gi of capacity by default if you don't specify blockDeviceMappings in your AWSNodeTemplate. This means, when its scheduling, that's what it will assume and why it's breaking up your workloads into separate nodes.

Once the node comes up, then it sees that there's actually ~220Gi of storage so it's able to consolidate down all the nodes that it just launched.

thelabdude commented 1 year ago

Yes, we're mounting the NVMe disks that come with d instances. Seems like Karpenter should support this

jonathan-innis commented 1 year ago

Yes, we're mounting the NVMe disks that come with d instances. Seems like Karpenter should support this

Agreed, we're tracking this issue here #2723. It's a bit complex because we either have to support instance type overrides or some sort of annotation-based mechanism like CAS has for understanding what your ephemeral-storage will actually be or we have to do the mounting for you.

@bwagner5 was actually doing some work in the AL2 AMI to automatically RAID0 the instance volume stores and mount them by default, which would mean, that when that AMI was released and widely adopted, Karpenter could at-minimum assume the volume size for those instances.

jonathan-innis commented 1 year ago

Do you have any thoughts or expectations on how you would like to see Karpenter handle this case?

thelabdude commented 1 year ago

Unfortunately, this is the second time this 20Gi default has bit me :-( If you look at the log I posted, there's zero indication that more instances are needed b/c of a pod's ephemeral-storage request didn't fit into the default 20Gi.

Part of the disconnect here is I thought Karpenter looks up metadata in AWS; I had to go through the code again to realize Karpenter does not look up the instance storage for specified types. Can it not look that up in some metadata service?

I don't have a strong opinion on how this should be solved right now, need to read through all the links related to #2723 more carefully. The AMI approach sounds promising.

Naively though, why not just let me specify the mapping of instance type / size to ephemeral storage in the Provisioner? People can end up doing all kinds of funky things with these disks so having an optional mapping config where I can specify xlarge = 237, 2xlarge = 474 etc .. that's at least better than what I have now of not using d instances and also having to fit a single EBS vol size for all instance sizes.

thelabdude commented 1 year ago

Renaming this issue because other than #2723 , I think the logs need to report ephemeral disk as part of the instance selection during scale-up esp. when the 20Gi default has been applied. If you look at the logs I posted earlier:

2023-03-31T18:20:38.857Z    INFO    controller.provisioner  launching machine with 3 pods requesting {"cpu":"526m","ephemeral-storage":"15Gi","memory":"1125Mi","pods":"7","vpc.amazonaws.com/pod-eni":"1"} from types c6id.8xlarge, m6id.2xlarge, r6id.xlarge, r6id.2xlarge, r6id.16xlarge and 19 other(s) {"commit": "7131be2-dirty", "provisioner": "karp-spot"}
2023-03-31T18:20:38.866Z    INFO    controller.provisioner  launching machine with 3 pods requesting {"cpu":"526m","ephemeral-storage":"15Gi","memory":"1125Mi","pods":"7","vpc.amazonaws.com/pod-eni":"1"} from types c6id.8xlarge, m6id.2xlarge, r6id.xlarge, r6id.2xlarge, r6id.16xlarge and 19 other(s) {"commit": "7131be2-dirty", "provisioner": "karp-spot"}
2023-03-31T18:20:38.876Z    INFO    controller.provisioner  launching machine with 3 pods requesting {"cpu":"526m","ephemeral-storage":"15Gi","memory":"1125Mi","pods":"7","vpc.amazonaws.com/pod-eni":"1"} from types c6id.8xlarge, m6id.2xlarge, r6id.xlarge, r6id.2xlarge, r6id.16xlarge and 19 other(s) {"commit": "7131be2-dirty", "provisioner": "karp-spot"}
2023-03-31T18:20:38.887Z    INFO    controller.provisioner  launching machine with 3 pods requesting {"cpu":"526m","ephemeral-storage":"15Gi","memory":"1125Mi","pods":"7","vpc.amazonaws.com/pod-eni":"1"} from types c6id.8xlarge, m6id.2xlarge, r6id.xlarge, r6id.2xlarge, r6id.16xlarge and 19 other(s) {"commit": "7131be2-dirty", "provisioner": "karp-spot"}
2023-03-31T18:20:38.897Z    INFO    controller.provisioner  launching machine with 3 pods requesting {"cpu":"526m","ephemeral-storage":"15Gi","memory":"1125Mi","pods":"7","vpc.amazonaws.com/pod-eni":"1"} from types c6id.8xlarge, m6id.2xlarge, r6id.xlarge, r6id.2xlarge, r6id.16xlarge and 19 other(s) {"commit": "7131be2-dirty", "provisioner": "karp-spot"}
2023-03-31T18:20:38.908Z    INFO    controller.provisioner  launching machine with 3 pods requesting {"cpu":"526m","ephemeral-storage":"15Gi","memory":"1125Mi","pods":"7","vpc.amazonaws.com/pod-eni":"1"} from types c6id.8xlarge, m6id.2xlarge, r6id.xlarge, r6id.2xlarge, r6id.16xlarge and 19 other(s) {"commit": "7131be2-dirty", "provisioner": "karp-spot"}
2023-03-31T18:20:38.918Z    INFO    controller.provisioner  launching machine with 3 pods requesting {"cpu":"526m","ephemeral-storage":"15Gi","memory":"1125Mi","pods":"7","vpc.amazonaws.com/pod-eni":"1"} from types c6id.8xlarge, m6id.2xlarge, r6id.xlarge, r6id.2xlarge, r6id.16xlarge and 19 other(s) {"commit": "7131be2-dirty", "provisioner": "karp-spot"}

There's nothing to indicate that the instances have the default 20Gi ephemeral storage limit imposed. Of course, if #2723 gets fixed, maybe this issue just goes away.

jonathan-innis commented 1 year ago

@thelabdude We're now logging the selected instance type capacity after the capacity is launched (#3695). Hopefully this helps track down issues in the future around ephemeral-storage constraints.

I think I'm going to close this at this point since we'll track the instance storage ask with #2723. Feel free to re-open if you additional thoughts or comments.

aws / karpenter-provider-aws