Open andy108369 opened 1 year ago
It looks like the alternative fix alone is not enough.
I just had the issue where nodelocaldns
would keep crashing in CrashLoopBackOff
# kubectl -n kube-system logs nodelocaldns-v5s45
2023/03/20 14:26:13 [INFO] Starting node-cache image: 1.21.1
2023/03/20 14:26:13 [INFO] Using Corefile /etc/coredns/Corefile
2023/03/20 14:26:13 [INFO] Using Pidfile
2023/03/20 14:26:13 [ERROR] Failed to read node-cache coreFile /etc/coredns/Corefile.base - open /etc/coredns/Corefile.base: no such file or directory
2023/03/20 14:26:13 [INFO] Skipping kube-dns configmap sync as no directory was specified
cluster.local.:53 on 169.254.25.10
in-addr.arpa.:53 on 169.254.25.10
ip6.arpa.:53 on 169.254.25.10
.:53 on 169.254.25.10
[INFO] plugin/reload: Running configuration MD5 = adf97d6b4504ff12113ebb35f0c6413e
CoreDNS-1.7.0
linux/amd64, go1.16.8,
[FATAL] plugin/loop: Loop (10.233.90.150:43697 -> 169.254.25.10:53) detected for zone "ip6.arpa.", see https://coredns.io/plugins/loop#troubleshooting. Query: "HINFO 4300695667419388152.5105250931155992130.ip6.arpa."
The only fix was the removing a bad search
domains by configuring netplan
the way mentioned in the original post (disabling the domains
config via DHCP) and restarting the kube-system pods.
CrashLoopBackOff
nodelocaldns podsThe issue is very well explained here https://github.com/kubernetes-sigs/kubespray/issues/9948#issuecomment-1533876540
This is for quick verification, instead of waiting for the kubespray to finish which can take up to 1 hour long.
forward . /etc/resolv.conf {
TO:
forward . 8.8.8.8 {
replace 8.8.8.8 with your preferred DNS servers
kubectl -n kube-system delete pods -l k8s-app=kube-dns
kubectl -n kube-system delete pods -l k8s-app=nodelocaldns
Set the upstream_dns_servers
in your inventory and kubespray your env again:
$ cd kubespray
kubespray$ grep -A2 upstream_dns_servers inventory/akash/group_vars/all/all.yml
upstream_dns_servers:
- 8.8.8.8
- 8.8.4.4
source venv/bin/activate
ansible-playbook -i inventory/akash/hosts.yaml -b -v cluster.yml
Verify
kubectl -n kube-system get cm coredns -o yaml | grep forward
Bounce the coredns
and nodelocaldns
pods in this order:
kubectl -n kube-system delete pods -l k8s-app=kube-dns
kubectl -n kube-system delete pods -l k8s-app=nodelocaldns
Verify all pods are in Running
state:
kubectl -n kube-system get pods -l k8s-app=kube-dns
kubectl -n kube-system get pods -l k8s-app=nodelocaldns
K8s DNS resolution breaks in pods when one uses DHCP / has bad DNS search domain is configured.
/etc/resolv.conf
file (kubelet does this);SERVFAIL
error (host google.com
,dig google.com
,nslookup google.com
);/etc/resolv.conf
file;accept-ra
is enabled in the netplan by default => refs https://bugs.launchpad.net/netplan/+bug/1858503The working netplan config:
We should document this case and give users the verification steps so they can verify their DNS is working properly once they set up the K8s cluster.
Alternative fixProvider owner can also change
dnsPolicy
fromDefault
toClusterFirst
for thecoredns
deployment &nodelocaldns
daemonset which will fix this behavior even when bad DNS search domain is present in the/etc/resolv.conf
file:More about the
dnsPolicy
-> https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/#pod-s-dns-policy Also worth reading https://kubernetes.io/docs/tasks/administer-cluster/dns-custom-nameservers/