canonical / microk8s

MicroK8s is a small, fast, single-package Kubernetes for datacenters and the edge.
https://microk8s.io
Apache License 2.0
8.43k stars 770 forks source link

DNS fails to lookup ip #3648

Open jsemohub opened 1 year ago

jsemohub commented 1 year ago

Summary

After enabling dns plugin I login into one of the pods and run nslookup google Command fails

What Should Happen Instead?

It should lookup a set of google ips

Reproduction Steps

  1. Create microk8s cluster with 3 nodes (Fedora 36). Each node has multiple interfaces (public and public nets)
  2. .Enable dns plugin
  3. Login into one of the pods and perform nslookup google.com

Introspection Report

Here are session outputs

DNS fails to lookup ip k exec -it nginx-ingress-microk8s-controller-m8ddh -n ingress -- bash bash-5.1$ nslookup google.com Server: 10.152.183.10 Address: 10.152.183.10:53

** server can't find google.com: SERVFAIL

It doesn't work. Here looking at the resolve. cat /etc/resolv.conf search ingress.svc.cluster.local svc.cluster.local cluster.local nameserver 10.152.183.10 options ndots:5

k get ConfigMap coredns -n kube-system -o yaml apiVersion: v1 data: Corefile: | .:53 { errors health { lameduck 5s } ready log . { class error } kubernetes cluster.local in-addr.arpa ip6.arpa { pods insecure fallthrough in-addr.arpa ip6.arpa } prometheus :9153 forward . 8.8.8.8 8.8.4.4 cache 30 loop reload loadbalance } kind: ConfigMap metadata: annotations: kubectl.kubernetes.io/last-applied-configuration: | {"apiVersion":"v1","data":{"Corefile":".:53 {\n errors\n health {\n lameduck 5s\n }\n ready\n log . {\n class error\n }\n kubernetes cluster.local in-addr.arpa ip6.arpa {\n pods insecure\n fallthrough in-addr.arpa ip6.arpa\n }\n prometheus :9153\n forward . 8.8.8.8 8.8.4.4\n cache 30\n loop\n reload\n loadbalance\n}\n"},"kind":"ConfigMap","metadata":{"annotations":{},"labels":{"addonmanager.kubernetes.io/mode":"EnsureExists","k8s-app":"kube-dns"},"name":"coredns","namespace":"kube-system"}} creationTimestamp: "2023-01-05T00:41:09Z" labels: addonmanager.kubernetes.io/mode: EnsureExists k8s-app: kube-dns name: coredns namespace: kube-system resourceVersion: "259864" uid: 1b6fc72b-eabd-4139-b2de-9764d08b6553

Can you suggest a fix?

Are you interested in contributing with a fix?

jsemohub commented 1 year ago

More output from the coredns pod

[ERROR] plugin/errors: 2 acme-v02.api.letsencrypt.org. AAAA: read udp 10.1.81.154:41110->8.8.4.4:53: read: no route to host [INFO] 10.1.81.153:39653 - 42585 "A IN acme-v02.api.letsencrypt.org. udp 57 false 1232" - - 0 2.001233069s [ERROR] plugin/errors: 2 acme-v02.api.letsencrypt.org. A: read udp 10.1.81.154:35430->8.8.4.4:53: i/o timeout [INFO] 10.1.81.153:46040 - 52173 "AAAA IN acme-v02.api.letsencrypt.org. udp 57 false 1232" - - 0 2.000694382s [ERROR] plugin/errors: 2 acme-v02.api.letsencrypt.org. AAAA: read udp 10.1.81.154:47142->8.8.4.4:53: i/o timeout [INFO] 10.1.81.153:35909 - 54411 "A IN acme-v02.api.letsencrypt.org. udp 57 false 1232" - - 0 2.000906637s [ERROR] plugin/errors: 2 acme-v02.api.letsencrypt.org. A: read udp 10.1.81.154:60004->8.8.4.4:53: i/o timeout

While I can easily lookup the address on the server using google DNS

nslookup google.com 8.8.4.4 Server: 8.8.4.4 Address: 8.8.4.4#53

Non-authoritative answer: Name: google.com Address: 142.250.72.110 Name: google.com Address: 2607:f8b0:4006:816::200e

jsemohub commented 1 year ago

Opening 53:upd port brings us to the next issue

[INFO] 10.1.81.153:46654 - 19085 "A IN acme-v02.api.letsencrypt.org. udp 57 false 1232" - - 0 0.000382567s [ERROR] plugin/errors: 2 acme-v02.api.letsencrypt.org. A: read udp 10.1.81.154:58691->8.8.8.8:53: read: no route to host

ktsakalozos commented 1 year ago

Hi @jsemohub, here is a suggestion, lets try to to get coredns use the host to forward requests. We switched to this approach in v1.26, so one option is to set your cluster with v1.26 (snap install microk8s --classic --channel=1.26). To do this in a pre-1.26 cluster:

jsemohub commented 1 year ago

I am using 1.26 channel BTW. Updated and restarted.

Still no connection.

[root@node1 k8s]# k exec -it pod/nginx-ingress-microk8s-controller-jnwph -n ingress -- bash bash-5.1$ cat /etc/resolv.conf search ingress.svc.cluster.local svc.cluster.local cluster.local nameserver 10.152.183.10 options ndots:5 bash-5.1$ nslookup google.com nslookup: read: Host is unreachable nslookup: read: Host is unreachable nslookup: read: Host is unreachable ^C bash-5.1$ nslookup google.com 8.8.8.8 nslookup: write to '8.8.8.8': Host is unreachable ;; connection timed out; no servers could be reached

I think I there is no internet access to the outside ips as in can't reach google's public DNS 8.8.8.8. Is there a quick way to troubleshoot/add calico static route?

jsemohub commented 1 year ago

Two independent problems seem to be contributing to the issue:

  1. /var/snap/microk8s/current/args/kubelet:--cluster-dns=10.152.183.10 There is no route from 10.1.0.0 to dns server 10.152.183.10
  2. There is no route to an outside ip addresses. So even if request got to dns it wouldn't resolve via google dns ip.

Any help with this would be much appreciated.

jsemohub commented 1 year ago

Routing issues persist on machines with 3 nics. On these boxes, event after a fresh microk8s install calico setup does not seem to be working and consequently DNS.

jsemohub commented 1 year ago

The mystery has been solved. For some reason, 10.1.0.0/16 firewall zone was NOT created by calico on some servers automatically. After manually adding a microk8s-cluster zone, I am able to proceed with creating ha-cluster. Keeping fingers crossed.

jsemohub commented 1 year ago

Got quite a bit further. Got stuck on cert-manager. ping cert-manager-webhook.cert-manager.svc PING cert-manager-webhook.cert-manager.svc (10.152.183.186): 56 data bytes

Will open another ticket...

al8ba commented 10 months ago

This process (with some extra failsafe logic) is in the latest dns enable script found at https://github.com/canonical/microk8s-core-addons/blob/main/addons/dns/enable

The find-resolv-conf.py that is at the heart of that has one major issue with IPv6 - to work correctly with IPv6 addresses with scope ids it needs python > 3.8 which, sadly, microk8s 1.27/stable and 1.28/stable (afaik) do not have. This means that on a system here where /run/systemd/resolve/resolv.conf contains the following

nameserver 192.168.121.1 nameserver fe80::5054:ff:fe00:b61d%2

the find-resolv-conf.py judges theses as unsafe, not because there is a loopback address, but because ipaddress.IPv6Address('fe80::5054:ff:fe00:b61d%2') explodes hideously on python 3.8 with a AddressValueError.