aws / karpenter-provider-aws

Karpenter is a Kubernetes Node Autoscaler built for flexibility, performance, and simplicity.
https://karpenter.sh
Apache License 2.0
6.63k stars 924 forks source link

Karpenter reports "D" series HDD instances as having nvme #5468

Open ryanfaircloth opened 8 months ago

ryanfaircloth commented 8 months ago

Description

Observed Behavior: the current set of annotations forces D series machines with HDD to incorrectly report as nvme karpenter.k8s.aws/instance-local-nvme

Expected Behavior: This annotation should not be present on D series machines perhaps we could adopt karpenter.k8s.aws/instance-local-disk

Reproduction Steps (Please include YAML):

Incorrect mapping is present in docs https://karpenter.sh/docs/reference/instance-types/#d3enxlarge

Versions: na/all

jmdeal commented 8 months ago

This is actually pretty interesting because my first assumption would be NVME == SSD but in this case it looks like the the hard drives on d3en instance types are presented as NVME devices. This screenshot is from a d3en.xlarge instance:

Screenshot 2024-01-14 at 2 33 07 PM

This also matches the information from the EC2 API which we use to generate these labels. Given that I think the label is technically correct, albeit unexpected. Is the ask here to be able to better differentiate between SSD and Hard Drive local storage or was it to change the label name to be more representative?

njtran commented 8 months ago

Added a feature label to consider if we want to change/add any label key names.

ryanfaircloth commented 8 months ago

Right the need is to target workload to the correct performance profile some workloads need high sequential throughput great for HDD my use case is heavily random and I do have a work around but the result surprised me maybe there is a reach back to aws to address the api?

jmdeal commented 8 months ago

I think the current labels are correct and just surprising because NVME is normally associated with SSDs. I think there could be a good argument for introducing new labels that better model the different storage types. Just brainstorming but I could see something like this potentially being a good path forward:

Thoughts @ryanfaircloth @njtran @jonathan-innis?

jonathan-innis commented 8 months ago

and I do have a work around

Just for insight, what's the workaround that you are using? Just directly targeting or ignoring the instance types through selectors?

Just brainstorming but I could see something like this potentially being a good path forward

I could see an argument for this. What's the way that we would distinguish off of the type of NVME that it is? Is this surfaced through the DescribeInstanceTypes API or is there some other mechanism that we would intuit this from?

ryanfaircloth commented 8 months ago

Right just exclude d which is correct but requires I know that what the upstream api really means

On Wed, Jan 17, 2024 at 12:40 AM Jonathan Innis @.***> wrote:

and I do have a work around

Just for insight, what's the workaround that you are using? Just directly targeting or ignoring the instance types through selectors?

— Reply to this email directly, view it on GitHub https://github.com/aws/karpenter-provider-aws/issues/5468#issuecomment-1894976206, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIN6WODCLT6V2XSPASHMMCTYO5P6BAVCNFSM6AAAAABB2CFQ6GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOJUHE3TMMRQGY . You are receiving this because you were mentioned.Message ID: @.***>

jmdeal commented 8 months ago

The InstanceStorageInfo struct we currently use for NVME information also has info on the individual disks, including if they're hdds or ssd:

type InstanceStorageInfo struct {
    _ struct{} `type:"structure"`

    // Describes the disks that are available for the instance type.
    Disks []*DiskInfo `locationName:"disks" locationNameList:"item" type:"list"`

    // Indicates whether data is encrypted at rest.
    EncryptionSupport *string `locationName:"encryptionSupport" type:"string" enum:"InstanceStorageEncryptionSupport"`

    // Indicates whether non-volatile memory express (NVMe) is supported.
    NvmeSupport *string `locationName:"nvmeSupport" type:"string" enum:"EphemeralNvmeSupport"`

    // The total size of the disks, in GB.
    TotalSizeInGB *int64 `locationName:"totalSizeInGB" type:"long"`
}

type DiskInfo struct {
    _ struct{} `type:"structure"`

    // The number of disks with this configuration.
    Count *int64 `locationName:"count" type:"integer"`

    // The size of the disk in GB.
    SizeInGB *int64 `locationName:"sizeInGB" type:"long"`

    // The type of disk.
    Type *string `locationName:"type" type:"string" enum:"DiskType"`
}

Assuming I'm not making any incorrect assumptions about this data (e.g. TotalSizeInGB is the sum of the total size of the individual disks) it should be pretty straightforward to derive these labels.