aws / karpenter-provider-aws

Karpenter is a Kubernetes Node Autoscaler built for flexibility, performance, and simplicity.
https://karpenter.sh
Apache License 2.0
6.21k stars 859 forks source link

Cannot customise spec.kubelet.flags on AL2023 due to quoting of node labels #6456

Open ohookins opened 2 weeks ago

ohookins commented 2 weeks ago

Description

Observed Behavior: Customising the NodeClass with userdata in the format specified for AL2023 with custom Kubelet flags causes the Kubelet to crash loop. It is unable to start.

Expected Behavior: Customising the NodeClass with userdata in the format specified for AL2023 allows you to customise Kubelet flags.

Reproduction Steps (Please include YAML):

Due to how the customised node labels are passed through userdata and nodeadm, we can't further customise Kubelet flags when using AL2023 (or presumably any other OS using nodeadm).

When defining our EC2NodeClass (only relevant config shown):

apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass
spec:
  amiFamily: AL2023
  userData: |
    apiVersion: node.eks.aws/v1alpha1
    kind: NodeConfig
    spec:
      kubelet:
        flags:
          - --registry-qps=0

What ends up in /etc/eks/kubelet/environment:

NODEADM_KUBELET_ARGS="--kubeconfig=/var/lib/kubelet/kubeconfig --image-credential-provider-bin-dir=/etc/eks/image-credential-provider --image-credential-provider-config=/etc/eks/image-credential-provider/config.json --node-ip=10.13.36.108 --cloud-provider=external --hostname-override=ip-10-13-36-108.ec2.internal --config=/etc/kubernetes/kubelet/config.json --config-dir=/etc/kubernetes/kubelet/config.json.d --node-labels="karpenter.sh/capacity-type=spot,karpenter.sh/nodepool=default" --registry-qps=0"

Since double quotes are used here, the additional flags are interpreted as a continuation of the node labels, and kubelet cannot start:

Jul 06 08:18:48 ip-10-13-36-108.ec2.internal kubelet[2072]: E0706 08:18:48.510504    2072 run.go:74] "command failed" err="failed to validate kubelet flags: invalid node labels: 'default --registry-qps=0' - a valid label must be an empty string or consist of alphanumeric characters, '-', '_' or '.', and must start and end with an alphanumeric character (e.g. 'MyValue',  or 'my_value',  or '12345', regex used for validation is '(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])?')"
ohookins commented 2 weeks ago

The unfortunate workaround is to add the extra flag into the kubelet config directly:

  userData: |
    #!/bin/bash
    echo "$(jq '.registryPullQPS=0' /etc/kubernetes/kubelet/config.json)" > /etc/kubernetes/kubelet/config.json
jigisha620 commented 1 week ago

The other workaround is to do something like this -

spec:
  userData: |
    apiVersion: node.eks.aws/v1alpha1
    kind: NodeConfig
    spec:
      kubelet:
        config:
          registry-qps: 0
ohookins commented 1 week ago

That's a better workaround actually. Thanks!

Note that the config file parameters are slightly different to the flags though:

    apiVersion: node.eks.aws/v1alpha1
    kind: NodeConfig
    spec:
      kubelet:
        config:
          registryPullQPS: 0