awslabs / amazon-eks-ami

Packer configuration for building a custom EKS AMI
https://awslabs.github.io/amazon-eks-ami/
MIT No Attribution
2.46k stars 1.15k forks source link

Cloud-init unhandled unknown content-type (application/node.eks.aws) userdata #1963

Closed amin-o closed 2 months ago

amin-o commented 2 months ago

What happened:

We are using Karpenter for node management and autoscaling. After migrating to AL2023, we noticed a warning in the system log. Specifically, when using the EC2NodeClass with custom userData as described below, the system log shows a warning about an unhandled unknown content type (application/node.eks.aws) in the user data.

[    8.950282] cloud-init[2183]: 2024-09-12 12:57:56,263 - __init__.py[WARNING]: Unhandled unknown content-type (application/node.eks.aws) userdata: 'b'# Karpenter Generated No'...'
[    8.953015] cloud-init[2183]: 2024-09-12 12:57:56,263 - __init__.py[WARNING]: Unhandled unknown content-type (application/node.eks.aws) userdata: 'b'---'...'

Despite this warning, inside the node, the configuration is applied correctly with the following:

/etc/kubernetes/kubelet/config.json.d/00-nodeadm.conf:

{
  "apiVersion": "kubelet.config.k8s.io/v1beta1",
  "clusterDNS": [
      "172.20.0.10"
  ],
  "cpuCFSQuota": false,
  "kind": "KubeletConfiguration",
  "maxPods": 29,
  "registerWithTaints": [
      {
          "effect": "NoSchedule",
          "key": "general"
      }
  ]
}

What you expected to happen:

The custom userData should be processed without warnings, ensuring proper handling of all content types.

How to reproduce it (as minimally and precisely as possible):

  1. Use the following EC2NodeClass specification:
apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass
metadata:
  name: default
spec:
  amiFamily: AL2023
  role: {{ .Values.karpenter.settings.nodeRole }}
  securityGroupSelectorTerms:
  - tags:
      karpenter.sh/discovery: {{ .Values.karpenter.settings.clusterName }}
  subnetSelectorTerms:
  - tags:
      karpenter.sh/discovery: {{ .Values.karpenter.settings.clusterName }}
  tags:
    karpenter.sh/discovery: {{ .Values.karpenter.settings.clusterName }}
    owner: {{ .Values.karpenter.settings.clusterName }}
  userData: |
    MIME-Version: 1.0
    Content-Type: multipart/mixed; boundary="BOUNDARY"

    --BOUNDARY
    Content-Type: application/node.eks.aws

    ---
    apiVersion: node.eks.aws/v1alpha1
    kind: NodeConfig
    spec:
      kubelet:
        config:
          cpuCFSQuota: false

    --BOUNDARY
    Content-Type: text/x-shellscript; charset="us-ascii"

    #!/bin/bash

    set -eoux pipefail
    modprobe tls

    cat <<EOF >> /etc/sysctl.conf

    # tcp keepalive
    net.ipv4.tcp_keepalive_time = 300
    net.ipv4.tcp_keepalive_probes = 5
    net.ipv4.tcp_keepalive_intvl = 15

    net.ipv4.tcp_slow_start_after_idle=0

    # bbr tcp congestion control algorithm
    net.core.default_qdisc=fq
    net.ipv4.tcp_congestion_control=bbr

    EOF

    sysctl -p /etc/sysctl.conf

    --BOUNDARY--
  1. Observe the system log to see the warnings.

Other:

Here is the link to the original issue I opened on the aws/karpenter-provider-aws repository: #6989

Environment:

amin-o commented 2 months ago

User data from the EC2 instance:

MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="BOUNDARY"

--BOUNDARY
Content-Type: application/node.eks.aws

# Karpenter Generated NodeConfig
apiVersion: node.eks.aws/v1alpha1
kind: NodeConfig
metadata:
  creationTimestamp: null
spec:
  cluster:
    apiServerEndpoint: REDACTED
    certificateAuthority: REDACTED
    cidr: 172.20.0.0/16
    name: REDACTED
  containerd: {}
  instance:
    localStorage: {}
  kubelet:
    config:
      clusterDNS:
      - 172.20.0.10
      maxPods: 58
      registerWithTaints:
      - effect: NoSchedule
        key: general
    flags:
    - --node-labels="karpenter.sh/capacity-type=spot,karpenter.sh/nodepool=general"

--BOUNDARY
Content-Type: application/node.eks.aws

---
apiVersion: node.eks.aws/v1alpha1
kind: NodeConfig
spec:
  kubelet:
    config:
      cpuCFSQuota: false

--BOUNDARY
Content-Type: text/x-shellscript; charset="us-ascii"

#!/bin/bash

set -eoux pipefail
modprobe tls

cat <<EOF >> /etc/sysctl.conf

# tcp keepalive
net.ipv4.tcp_keepalive_time = 300
net.ipv4.tcp_keepalive_probes = 5
net.ipv4.tcp_keepalive_intvl = 15

net.ipv4.tcp_slow_start_after_idle=0

# bbr tcp congestion control algorithm
net.core.default_qdisc=fq
net.ipv4.tcp_congestion_control=bbr

EOF

sysctl -p /etc/sysctl.conf

--BOUNDARY--
cartermckinnon commented 2 months ago

You can ignore this log message, that just means cloud-init is going to ignore that part of the MIME document (which is intended). The NodeConfig objects are retrieved by nodeadm: https://github.com/awslabs/amazon-eks-ami/tree/main/nodeadm

amin-o commented 2 months ago

You can ignore this log message, that just means cloud-init is going to ignore that part of the MIME document (which is intended). The NodeConfig objects are retrieved by nodeadm: https://github.com/awslabs/amazon-eks-ami/tree/main/nodeadm

Thank you for the clarification.