canonical / microk8s

MicroK8s is a small, fast, single-package Kubernetes for datacenters and the edge.
https://microk8s.io
Apache License 2.0
8.52k stars 773 forks source link

Pod's access to the internet suddenly stopped working, DNS resolution fails #4459

Open McToel opened 8 months ago

McToel commented 8 months ago

Summary

Suddenly, pods can no longer access the internet. When I try to curl google.com from inside a pod, it fails with "connection reset by peer" or a different error. On some very rare occasions, it gives a result, which does not equal curl google.com from the host machine. DNS add-on is enabled, microk8s is running on an Ubuntu server 22.04 host and is up-to-date.

I have made the following observations running commands in pods:

Running nslookup returns the same result for every external domain:

root@dnsutils:/# nslookup deb.debian.org
Server:     10.152.183.10
Address:    10.152.183.10#53

Non-authoritative answer:
Name:   deb.debian.org.fritz.box
Address: 45.76.93.104

Running host gives the same result as nslookup:

root@dnsutils:/# host deb.debian.org
deb.debian.org.fritz.box has address 45.76.93.104
deb.debian.org.fritz.box has IPv6 address 2001:19f0:6c00:1b0e:5400:4ff:fecd:7828

dig works fine, returning the correct IP address:

root@dnsutils:/# dig deb.debian.org

; <<>> DiG 9.9.5-9+deb8u19-Debian <<>> deb.debian.org
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 41191
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;deb.debian.org.            IN  A

;; ANSWER SECTION:
deb.debian.org.     5   IN  CNAME   debian.map.fastlydns.net.
debian.map.fastlydns.net. 5 IN  A   146.75.122.132

;; Query time: 20 msec
;; SERVER: 10.152.183.10#53(10.152.183.10)
;; WHEN: Thu Mar 14 12:10:46 UTC 2024
;; MSG SIZE  rcvd: 135

Running curl against the valid IP from google.com does work and return the correct result.

I have done the official Kubernetes DNS troubleshooting, however none of the mentioned error occurred. /etc/resolv.conf in the pods looks like this:

search default.svc.cluster.local svc.cluster.local cluster.local fritz.box
nameserver 10.152.183.10
options ndots:5

I guess that some part of my DNS configuration is incorrect, but I have not changed anything before the internet broke, and the DNS add-on should work out of the box as far as I understand.

Introspection Report

inspection-report-20240313_232359.tar.gz

McToel commented 8 months ago

After two days of troubleshooting, I found a solution by disabling DNS and re-enabling it with my routers IP address as DNS:

microk8s disable dns
microk8s enable dns:192.168.178.1

I found the IP in /run/systemd/resolve/resolv.conf which, according to the Kubernetes docs, should be the correct resolve file when the host machine is using systemd-resolved.

What did not help was to set a different DNS (I've tried Cloudflare 1.1.1.1). Also, digworked the whole time, so I really do not understand what is going on.

neoaggelos commented 8 months ago

Hi @McToel

Which microk8s version are you using? Starting from MicroK8s 1.26, MicroK8s will attempt to pick the upstream nameservers from /run/systemd/resolve/resolv.conf by default.

  1. How do you (and did you) enable dns?

  2. What's the output of snap run --shell microk8s -c '$SNAP/scripts/find-resolv-conf.py'?

  3. What's in your /run/systemd/resolve/resolv.conf?

McToel commented 8 months ago

I'm running MicroK8s v1.29.2 revision 6641.

It could be, that as I first enabled DNS I was running a version prior to 1.26. But while investigating the problem, I have disabled and enabled dns a few times. I enabled dns with microk8s enable dns in the beginning.

Here is the output for 2. and 3.:

➜  ~ snap run --shell microk8s -c '$SNAP/scripts/find-resolv-conf.py'
/run/systemd/resolve/resolv.conf
➜  ~ cat /run/systemd/resolve/resolv.conf
# This is /run/systemd/resolve/resolv.conf managed by man:systemd-resolved(8).
# Do not edit.
#
# This file might be symlinked as /etc/resolv.conf. If you're looking at
# /etc/resolv.conf and seeing this text, you have followed the symlink.
#
# This is a dynamic resolv.conf file for connecting local clients directly to
# all known uplink DNS servers. This file lists all configured search domains.
#
# Third party programs should typically not access this file directly, but only
# through the symlink at /etc/resolv.conf. To manage man:resolv.conf(5) in a
# different way, replace this symlink by a static file or a different symlink.
#
# See man:systemd-resolved.service(8) for details about the supported modes of
# operation for /etc/resolv.conf.

nameserver 192.168.178.1
search fritz.box
ThijsBorst commented 8 months ago

I have exactly the same problem... I can't resolve it by pointing it to my pi hole either. It works for a second and then the problems start again.

BennyDeeDev commented 8 months ago

I had a similar issue and fixed it with conditional forwarding on my pi hole

image

Maybe someone with more network knowledge can explain why this only happens to pods but not on other devices in my network.

Hoping this can help you out

ThijsBorst commented 7 months ago

I recreated the dns service with the following settings, which worked for me.

First remove it: microk8s disable dns

Then after I've recreated it: microk8s enable dns:<pi-hole address>

ErnyTech commented 7 months ago
root@dnsutils:/# nslookup deb.debian.org
Server:       10.152.183.10
Address:  10.152.183.10#53

Non-authoritative answer:
Name: deb.debian.org.fritz.box
Address: 45.76.93.104

It looks that you are suffering DNS hijacking, please check this https://crapts.org/2024/04/21/all-fritz-box-modems-have-been-hijacked/

syedhaidy commented 7 months ago

Hi , Applied below steps, pods started to communicate each-other.

  1. microk8s disable dns
  2. microk8s enable dns:<ip address from /run/systemd/resolve/resolv.conf>

But after 1 hour , pod suddenly stopped communicating with each other.

Please help me to resolve this issue.