amazonlinux / amazon-linux-2023

Amazon Linux 2023
https://aws.amazon.com/linux/amazon-linux-2023/
Other
523 stars 39 forks source link

[Bug] - Cannot get Kubernetes CoreDNS working since al2023-ami-2023.2.20231002.0-kernel-6.1-x86_64 #528

Closed gabrielbull closed 11 months ago

gabrielbull commented 11 months ago

Describe the bug Starting with al2023-ami-2023.2.20231002.0-kernel-6.1-x86_64 (ami-036f5574583e16426), setting up a fresh Kubernetes cluster will not work and CoreDNS will not be able to be setup with either calico or flannel.

Why do I consider this an amazon-linux-2023 bug and not a coredns bug, this works on every other linux distro and on every amazon-linux-2023 version before al2023-ami-2023.2.20231002.0-kernel-6.1-x86_64. This bug was introduced by changes to amazon-linux-2023. Linked issue on the CoreDNS repo where one of the maintainers believes this is not a CoreDNS issue: https://github.com/coredns/coredns/issues/6368

Affected versions:

To Reproduce Steps to reproduce the behavior:

sudo yum upgrade -y
sudo yum install -y nmap-ncat iproute-tc git
cat <<EOF | sudo tee /etc/modules-load.d/k8s.conf
overlay
br_netfilter
EOF

sudo modprobe overlay
sudo modprobe br_netfilter

# sysctl params required by setup, params persist across reboots
cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-iptables  = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward                 = 1
EOF

# Apply sysctl params without reboot
sudo sysctl --system

# Install containerd
sudo yum install containerd -y
sudo mkdir -p /etc/containerd
sudo containerd config default | sudo tee /etc/containerd/config.toml
sudo sed -i 's/SystemdCgroup = false/SystemdCgroup = true/g' /etc/containerd/config.toml
sudo sed -i 's/sandbox_image = "registry.k8s.io\/pause:.*/sandbox_image = "registry.k8s.io\/pause:3.8"/g' /etc/containerd/config.toml
sudo systemctl restart containerd
sudo systemctl enable --now containerd

# Add kubernetes repo to yum
cat <<EOF | sudo tee /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://packages.cloud.google.com/yum/repos/kubernetes-el7-\$basearch
enabled=1
gpgcheck=1
gpgkey=https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg
exclude=kubelet kubeadm kubectl
EOF

# Set SELinux in permissive mode (effectively disabling it)
sudo setenforce 0
sudo sed -i 's/^SELINUX=enforcing$/SELINUX=permissive/' /etc/selinux/config

# Install kubernetes
sudo yum install -y kubelet-1.27.3-0 kubeadm-1.27.3-0 kubectl-1.27.3-0 --disableexcludes=kubernetes
sudo systemctl restart kubelet
sudo systemctl enable --now kubelet

Steps to reproduce with calico:

sudo kubeadm init --pod-network-cidr=192.168.0.0/16

# Install calico
sudo curl https://raw.githubusercontent.com/projectcalico/calico/v3.26.3/manifests/calico.yaml -O
kubectl apply -f calico.yaml --kubeconfig=/etc/kubernetes/admin.conf

Steps to reproduce with flannel:

sudo kubeadm init --pod-network-cidr=10.244.0.0/16

# Install flannel
kubectl apply -f https://github.com/flannel-io/flannel/releases/latest/download/kube-flannel.yml

Expected behavior CoreDNS to start, like on previous aws linux versions.

Actual behavior CoreDNS hangs trying to fetch

Screenshots

Expected screenshot from al2023-ami-2023.2.20230920.1-kernel-6.1-x86_64 (ami-0d406e26e5ad4de53):

277453123-795aa57a-bbbe-4aeb-8b6e-fe79b73a2dfb

Actual screenshot from al2023-ami-2023.2.20231002.0-kernel-6.1-x86_64 (ami-036f5574583e16426):

Capture d’écran, le 2023-10-23 à 15 27 05
ncopa commented 11 months ago

This is probably due to this: https://github.com/amazonlinux/amazon-ec2-net-utils/issues/97

gabrielbull commented 11 months ago

@ncopa Yeah looks like it. Pretty nasty bug.

nmeyerhans commented 11 months ago

This is fixed as of the 2023.2.20231030, just published.