Azure / karpenter-provider-azure

AKS Karpenter Provider
Apache License 2.0
308 stars 46 forks source link

Mega Issue: Supported AKS Kubelet Configuration #196

Open Bryce-Soghigian opened 3 months ago

Bryce-Soghigian commented 3 months ago

Tell us about your request

Karpenter Core plans on moving the kubelet configuration outside of the core api and instead have cloudproviders maintain their own set of supported kubelet configuration in the v1 api. The AKS provider needs a migration plan so that we can stay on track with keeping our core version synced close with upstream.

Also AKS supports the following kubelet configuration: https://learn.microsoft.com/en-us/azure/aks/custom-node-configuration?tabs=linux-node-pools#kubelet-configuration

We need to start implementing the propagation for all of the aks supported kubelet configuration so karpenter has feature parity. Perhaps we can also drive further discussion in other kubelet configuration features customers want and let them configure them through karpenter before rolling out to the wider audience at AKS.

This issue has been created to track

  1. the migration plan to move away from cloud neutral core configuration
  2. each of the supported kubelet config fields we want to expose

Attachments

See: https://github.com/kubernetes-sigs/karpenter/issues/758#issuecomment-1971592904 Thread on Slack we talked about moving from cloud neutral configuration: https://kubernetes.slack.com/archives/C04JW2J5J5P/p1709226455964629

Community Note

Tomasz-Kluczkowski commented 1 month ago

Is the kubelet configuration via karpenter node pool even working? I tried setting this on one of the node pools:

spec:
  kubelet:
    systemReserved:
      cpu: 1000m
    kubeReserved:
      cpu: 1000m

then inspected the kubelet settings on the node and found that the settings are not applied at all?

I am not sure if ps aufx | grep kubelet is the right command to check it, but considering it scheduled a pod which requests 15000m cores to a node which can allocate 15750m cores at maximum with no reservations when I wanted 2000m cores reserved, it definitely does not work as expected....

root@aks-cpu-reserved-9bcbd:/# ps aufx | grep kubelet
root        3198  1.8  0.3 3039332 119056 ?      Ssl  10:12   0:10 
/usr/local/bin/kubelet --enable-server
--node-labels=karpenter.azure.com/sku-memory=32768,kubernetes.io/os=linux,kubernetes.azure.com/cluster=MC_karpenter-trial_karpenter-trial_uksouth,kubernetes.azure.com/mode=user,karpenter.azure.com/sku-gpu-count=0,kubernetes.azure.com/nodenetwork-vnetguid=f82592a3-317e-4e60-9d4d-23bcbf4f0e60,karpenter.azure.com/sku-storage-ephemeralos-maxsize=274.877906944,karpenter.azure.com/sku-name=Standard_F16s_v2,kubernetes.azure.com/role=agent,kubernetes.azure.com/podnetwork-type=overlay,karpenter.sh/nodepool=cpu-reserved,karpenter.azure.com/sku-encryptionathost-capable=true,karpenter.sh/capacity-type=spot,node.kubernetes.io/instance-type=Standard_F16s_v2,karpenter.azure.com/sku-family=F,topology.kubernetes.io/region=uksouth,karpenter.azure.com/sku-networking-accelerated=true,kubernetes.azure.com/network-subnet=aks-subnet,kubernetes.azure.com/ebpf-dataplane=cilium,kubernetes.io/arch=amd64,karpenter.azure.com/sku-storage-premium-capable=true,karpenter.azure.com/sku-version=2,karpenter.azure.com/sku-cpu=16
--v=2 
--volume-plugin-dir=/etc/kubernetes/volumeplugins
--kubeconfig /var/lib/kubelet/kubeconfig 
--bootstrap-kubeconfig /var/lib/kubelet/bootstrap-kubeconfig 
--runtime-request-timeout=15m 
--container-runtime-endpoint=unix:///run/containerd/containerd.sock 
--runtime-cgroups=/system.slice/containerd.service 
--cgroup-driver=systemd 
--max-pods=250 
--authentication-token-webhook=true 
--rotate-certificates=true 
--authorization-mode=Webhook 
--pod-max-pids=-1 
--event-qps=0 
--register-with-taints=cpu-reserved=true:NoSchedule 
--kube-reserved=cpu=260m,memory=3645Mi 
--cluster-dns=10.0.0.10 
--image-gc-high-threshold=85 
--tls-private-key-file=/etc/kubernetes/certs/kubeletserver.key --node-status-update-frequency=10s --keep-terminated-pod-volumes=false --kubeconfig=/var/lib/kubelet/kubeconfig --pod-manifest-path=/etc/kubernetes/manifests --tls-cipher-suites=TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_RSA_WITH_AES_256_GCM_SHA384,TLS_RSA_WITH_AES_128_GCM_SHA256 --enforce-node-allocatable=pods --azure-container-registry-config=/etc/kubernetes/azure.json --protect-kernel-defaults=true --cluster-domain=cluster.local --client-ca-file=/etc/kubernetes/certs/ca.crt --cloud-config=/etc/kubernetes/azure.json --pod-infra-container-image=mcr.microsoft.com/oss/kubernetes/pause:3.6 --tls-cert-file=/etc/kubernetes/certs/kubeletserver.crt --cloud-provider=external --eviction-hard=memory.available<750Mi --cgroups-per-qos=true --address=0.0.0.0 --streaming-connection-idle-timeout=4h --resolv-conf=/run/systemd/resolve/resolv.conf --read-only-port=0 
--system-reserved=memory=0,cpu=0 --anonymous-auth=false --image-gc-low-threshold=80
Bryce-Soghigian commented 1 month ago

https://learn.microsoft.com/en-gb/azure/aks/node-autoprovision?tabs=azure-cli#unsupported-features @Tomasz-Kluczkowski

Bryce-Soghigian commented 1 month ago

Azure Karpenter does reference the data structure internally for passing around kubelet configuration, but we do not allow customers to set the values yet.