aws / karpenter-provider-aws

Karpenter is a Kubernetes Node Autoscaler built for flexibility, performance, and simplicity.
https://karpenter.sh
Apache License 2.0
6.68k stars 934 forks source link

blockDeviceMappings ignored for root volume #6358

Closed RobinFrcd closed 3 months ago

RobinFrcd commented 3 months ago

Description

Hi, I'm trying to deploy GPU nodes with Karpenter and as GPU docker images are usually pretty big I need to increase the root volume size, like stated in the doc: https://karpenter.sh/docs/concepts/nodeclasses/#al2-1

I'm using the recommended GPU AMI (amazon-linux-2-gpu) provided here: https://docs.aws.amazon.com/eks/latest/userguide/retrieve-ami-id.html.

Here's my EC2NodeClass:

apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass
metadata:
  name: gpu
spec:
  amiFamily: AL2 # Amazon Linux 2
  role: "{{ .Values.nodeIAMRole }}"
  subnetSelectorTerms:
    - tags:
        karpenter.sh/discovery: "{{ .Values.clusterName }}"
  securityGroupSelectorTerms:
    - tags:
        karpenter.sh/discovery: "{{ .Values.clusterName }}" 
  amiSelectorTerms:
    - id: "{{ .Values.gpuAmiID }}"
  blockDeviceMappings:
    - deviceName: /dev/xvda
      ebs:
        volumeSize: 60Gi
        volumeType: gp3
        encrypted: true

but nodes deployed with this class are stuck with the default 20G volume:

Capacity:                                                                                                                                                                                                                                                                                   
  cpu:                4                                                                                                                                                                                                                                                                     
  ephemeral-storage:  20959212Ki                                                                                                                                                                                                                                                            
  hugepages-1Gi:      0                                                                                                                                                                                                                                                                     
  hugepages-2Mi:      0                                                                                                                                                                                                                                                                     
  memory:             16069064Ki                                                                                                                                                                                                                                                            
  pods:               29    
jmdeal commented 3 months ago

Which version of Karpenter are you using? Are you able to share your NodePool spec and the affected Node + NodeClaim spec as well? A quick test in my own cluster on 0.37.0 works as expected. Are you also able to validate if the EC2 instance was launched with a 20Gb volume or a 60Gb volume?

RobinFrcd commented 3 months ago

Thanks for your answer ! While copy/pasting the specs you asked, I realized I had

      nodeClassRef:
        apiVersion: karpenter.k8s.aws/v1beta1
        kind: EC2NodeClass
        name: default

in my NodePool. I replaced it with name: gpu and now it works as expected.

Sorry for the false issue !