awslabs / amazon-eks-ami

Packer configuration for building a custom EKS AMI
https://awslabs.github.io/amazon-eks-ami/
MIT No Attribution
2.41k stars 1.14k forks source link

bug(nodeadm): containerd BaseRuntimeSpec config not merged correctly #1928

Open vedantrathore opened 3 weeks ago

vedantrathore commented 3 weeks ago

What happened: While setting custom rlimits to containerd containers as mentioned in the documentation you can use a custom NodeConfig passed into the user data to override it. However we are using karpenter for provisioning nodes and it adds a custom NodeConfig to the user data as well to establish cluster connectivity. When adding our custom NodeConfig to the user data, the baseRuntimeSpec config key is not reflected in the merged NodeConfig.

User data script:

MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="BOUNDARY"

--BOUNDARY
Content-Type: application/node.eks.aws

# Karpenter Generated NodeConfig
---
apiVersion: node.eks.aws/v1alpha1
kind: NodeConfig
metadata:
  creationTimestamp: null
spec:
  cluster:
    apiServerEndpoint: xxxxxxxxxx
    certificateAuthority: xxxxxxxxxxx
    cidr: xxxxxxxxxx
    name: xxxxxxxxxx
  containerd: {}
  instance:
    localStorage: {}
  kubelet:
    config:
      clusterDNS:
      - xxxxxxxxxx
      maxPods: xx
    flags:
    - --node-labels="karpenter.sh/capacity-type=on-demand"

--BOUNDARY
Content-Type: application/node.eks.aws

---
apiVersion: node.eks.aws/v1alpha1
kind: NodeConfig
spec:
  containerd:
    baseRuntimeSpec:
      process:
        rlimits:
          - type: RLIMIT_NOFILE
            soft: 1024
            hard: 1024

--BOUNDARY--

What you expected to happen: Expected merged NodeConfig

apiVersion: node.eks.aws/v1alpha1
kind: NodeConfig
spec:
  cluster:
    apiServerEndpoint: xxxxxxxxxx
    certificateAuthority: xxxxxxxxxxx
    cidr: xxxxxxxxxx
    name: xxxxxxxxxx
  instance:
    localStorage: {}
  kubelet:
    config:
      clusterDNS:
      - xxxxxxxxxx
      maxPods: xx
    flags:
    - --node-labels="karpenter.sh/capacity-type=on-demand"
  containerd:
    baseRuntimeSpec:
      process:
        rlimits:
          - type: RLIMIT_NOFILE
            soft: 1024
            hard: 1024

But the result is

apiVersion: node.eks.aws/v1alpha1
kind: NodeConfig
spec:
  cluster:
    apiServerEndpoint: xxxxxxxxxx
    certificateAuthority: xxxxxxxxxxx
    cidr: xxxxxxxxxx
    name: xxxxxxxxxx
  instance:
    localStorage: {}
  containerd: {}
  kubelet:
    config:
      clusterDNS:
      - xxxxxxxxxx
      maxPods: xx
    flags:
    - --node-labels="karpenter.sh/capacity-type=on-demand"

How to reproduce it (as minimally and precisely as possible):

Pass the above mentioned node config into user data of any EC2 instance, ssh and run nodeadm init The merged node config should be visible in the logs.

Anything else we need to know?: I dived into nodeadm source code to understand why it is happening. My understanding is, when we call the config.Merge(nodeConfig) here we setup our merger with some custom transformers using mergo.WithTransformers(nodeConfigTransformer{})

The transformer sets up custom transformers for containerd and kubelet config using type reflection but we only setup the transformer for Config key of containerd config and not BaseRuntimeSpec. I reused the logic from the func transformKubeletConfig and tested by making the following changes to the transformer to see if the merge functionality is working as expected. Please let me know this makes sense, would be happy to send out a PR for this fix.

const (
    kubeletFlagsName  = "Flags"
    kubeletConfigName = "Config"

    containerdBaseRuntimeSpecName = "BaseRuntimeSpec"
    containerdConfigName = "Config"
)

type nodeConfigTransformer struct{}

func (t nodeConfigTransformer) Transformer(typ reflect.Type) func(dst, src reflect.Value) error {
    if typ == reflect.TypeOf(ContainerdOptions{}) {
        return func(dst, src reflect.Value) error {
            t.transformContainerdConfig(
                dst.FieldByName(containerdConfigName),
                src.FieldByName(containerdConfigName),
            )

            if err := t.transformContainerdBaseRuntimeSpec(
                dst.FieldByName(containerdBaseRuntimeSpecName),
                src.FieldByName(containerdBaseRuntimeSpecName),
            ); err != nil {
                return err
            }

            return nil
        }
    } else if typ == reflect.TypeOf(KubeletOptions{}) {
        return func(dst, src reflect.Value) error {
            t.transformKubeletFlags(
                dst.FieldByName(kubeletFlagsName),
                src.FieldByName(kubeletFlagsName),
            )

            if err := t.transformKubeletConfig(
                dst.FieldByName(kubeletConfigName),
                src.FieldByName(kubeletConfigName),
            ); err != nil {
                return err
            }

            return nil
        }
    }
    return nil
}
cartermckinnon commented 3 weeks ago

@vedantrathore makes sense to me, feel free to open a PR!

vedantrathore commented 3 weeks ago

Sounds good, would there be any different merge strategy in case of transformContainerdBaseRuntimeSpec ? Currently I just reused the same code as for transformKubeletConfig

cartermckinnon commented 3 weeks ago

@vedantrathore makes sense to me, you can add a test case to confirm