K8S with MetalLB - unable to access dns server

i5Js commented 1 year ago

Hi, it's very close to the issue 201. Everything was working fine, as usual, but suddenly, now, I always have time out:

i5Js@virtsrv:~/K8s/pihole$ nslookup -all -debug -type=ANY -class=ANY www.google.es 192.168.1.116
Default server: 192.168.1.116
Address: 192.168.1.116#53
Default server: 192.168.1.7
Address: 192.168.1.7#53
Default server: 192.168.1.254
Address: 192.168.1.254#53

Set options:
  novc          nodebug     nod2
  search        recurse
  timeout = 0       retry = 3   port = 53   ndots = 1
  querytype = A         class = IN
  srchlist = noldor.local
;; Connection to 192.168.1.116#53(192.168.1.116) for www.google.es failed: timed out.
;; Connection to 192.168.1.116#53(192.168.1.116) for www.google.es failed: timed out.
;; connection timed out; no servers could be reached

;; Connection to 192.168.1.116#53(192.168.1.116) for www.google.es failed: timed out.

I have deployed pihole using helm, with the following values:

dnsHostPort:
  # -- set this to true to enable dnsHostPort
  enabled: false
  # -- default port for this pod
  port: 53

# -- Configuration for the DNS service on port 53
serviceDns:

  # -- deploys a mixed (TCP + UDP) Service instead of separate ones
  mixedService: false

  # -- `spec.type` for the DNS Service
  type: LoadBalancer
#  type: NodePort

  # -- The port of the DNS service
  port: 53

  # -- Optional node port for the DNS service
  nodePort: ""

  # -- `spec.externalTrafficPolicy` for the DHCP Service
  externalTrafficPolicy: Local

  # -- A fixed `spec.loadBalancerIP` for the DNS Service
  loadBalancerIP: 192.168.1.116
  # -- A fixed `spec.loadBalancerIP` for the IPv6 DNS Service
  loadBalancerIPv6: ""

  # -- Annotations for the DNS service
  annotations: 
    # metallb.universe.tf/address-pool: network-services
     metallb.universe.tf/allow-shared-ip: pihole-svc

And the services come up fine:

i5Js@virtsrv:~/K8s/pihole$ k get svc -n dns-home
NAME             TYPE           CLUSTER-IP      EXTERNAL-IP     PORT(S)                    AGE
pihole-dns-tcp   LoadBalancer   10.97.41.222    192.168.1.116   53:31075/TCP               10m
pihole-dns-udp   LoadBalancer   10.108.8.199    192.168.1.116   53:31289/UDP               10m
pihole-web       ClusterIP      10.110.49.244   <none>          80/TCP,443/TCP,49312/TCP   10m

I can access to the web-ui and work with pihole database.

I have upgraded metallb to the last version, but it didn't help.

Any tip to troubleshoot?

Thanks all.

MoJo2600 commented 1 year ago

That is very strange. Could I don't see any obvious errors. Could you please try to restart the metallb speaker pods? I had this issue once where the speaker pods stopped advertising the IP adresses

i5Js commented 1 year ago

Yes, indeed, I have even removed metallb and I have deployed again. Same… time out

MoJo2600 commented 1 year ago

But then I'm out of ideas. My network knowledge is not that 'deep' to give you a good answer on how to troubleshoot this issue. I don't think this really related to pihole. I rather think this is an issue with the network somehow. I think, i would start up a nginx container and add and Metallb loadbalancer to it the same way it is working with pihole and see if you can reach it. Just to rule out the pihole container itself.

Did you try to do an nmap on the host. If I do nmap -sP -PR 192.168.1.116/32. I receive Nmap done: 1 IP address (1 host up) scanned in 1.03 seconds as an result for my pihole lb ip.

i5Js commented 1 year ago

Hey man, thank you, I am not good with the network either. I agree there is something else, but it's not in the pihole, because it was working fine. Find the nmap output below:

i5Js@virtsrv:~$ nmap -sP -PR 192.168.1.116/32
Starting Nmap 7.80 ( https://nmap.org ) at 2023-02-16 10:14 CET
Nmap scan report for 192.168.1.116
Host is up (0.00029s latency).
Nmap done: 1 IP address (1 host up) scanned in 0.00 seconds

MoJo2600 commented 1 year ago

Is the pihole pod in running state? Kubernetes will not route traffic to the pod until it is ready.

i5Js commented 1 year ago

It is :(

i5Js@virtsrv:~/K8s/pihole$ k get pod -n dns-home -o wide
NAME                      READY   STATUS    RESTARTS   AGE     IP            NODE        NOMINATED NODE   READINESS GATES
pihole-78f7955547-zmjmb   2/2     Running   0          3d15h   10.244.2.78   virtk8sn3   <none>           <none>

i5Js commented 1 year ago

Sure there is something wrong with my VM, metallb, or cluster, because I have deployed a new k3s cluster some raspberries and it works fine:

i5Js@virtsrv:~/K8s/pihole$ nslookup www.google.es 192.168.1.120
Server:     192.168.1.120
Address:    192.168.1.120#53

Non-authoritative answer:
Name:   www.google.es
Address: 142.251.39.99
Name:   www.google.es
Address: 2a00:1450:400e:811::2003

So we can close this issue or keep it open for troubleshooting.

MoJo2600 commented 1 year ago

Maybe the IP address is assigned twice in your network?

i5Js commented 1 year ago

mmmm I can't be so stupid hahahahahahah, I will check it too :)

i5Js commented 1 year ago

Thanks God I am not, it's not duplicated.

i5Js commented 1 year ago

Well, bumping this topic.

Something happen with latest metallb, k3s, and pihole.

All the deployments are up and running, services, endpoint, everything, but the request to the DNS ip doesn't work:

i5Js@raspiserver:~/K3s/deploy/helm_deploy/metallb$ ping 192.168.1.122
PING 192.168.1.122 (192.168.1.122) 56(84) bytes of data.
64 bytes from 192.168.1.122: icmp_seq=1 ttl=64 time=0.460 ms
64 bytes from 192.168.1.122: icmp_seq=2 ttl=64 time=0.606 ms
64 bytes from 192.168.1.122: icmp_seq=3 ttl=64 time=0.544 ms

i5Js@raspiserver:~/K3s/deploy/helm_deploy/metallb$ k get svc -n dns-home
NAME             TYPE           CLUSTER-IP     EXTERNAL-IP     PORT(S)                    AGE
pihole-web       ClusterIP      10.43.74.191   <none>          80/TCP,443/TCP,49312/TCP   167m
pihole-dns-tcp   LoadBalancer   10.43.1.86     192.168.1.122   53:30668/TCP               167m
pihole-dns-udp   LoadBalancer   10.43.11.13    192.168.1.122   53:31335/UDP               167m

i5Js@raspiserver:~/K3s/deploy/helm_deploy/metallb$ k get svc -n kube-system
NAME                                       TYPE           CLUSTER-IP      EXTERNAL-IP                   PORT(S)                                     AGE
kube-dns                                   ClusterIP      10.43.0.10      <none>                        53/UDP,53/TCP,9153/TCP                      29h
metrics-server                             ClusterIP      10.43.219.226   <none>                        443/TCP                                     29h
nginx-ingress-nginx-controller-admission   ClusterIP      10.43.150.179   <none>                        443/TCP                                     24h
nginx-ingress-nginx-controller             LoadBalancer   10.43.106.5     192.168.1.121,192.168.1.120   80:30509/TCP,443:31278/TCP,3306:32679/TCP   24h

i5Js@raspiserver:~/K3s/deploy/helm_deploy/metallb$ k get ep -n dns-home
NAME             ENDPOINTS                                       AGE
pihole-web       10.42.2.10:443,10.42.2.10:80,10.42.2.10:49312   165m
pihole-dns-tcp   10.42.2.10:53                                   165m
pihole-dns-udp   10.42.2.10:53                                   165m

But....

i5Js@raspiserver:~/K3s/deploy/helm_deploy/metallb$ dig +tcp @192.168.1.122 www.google.es
;; Connection to 192.168.1.122#53(192.168.1.122) for www.google.es failed: timed out.
;; Connection to 192.168.1.122#53(192.168.1.122) for www.google.es failed: timed out.
;; Connection to 192.168.1.122#53(192.168.1.122) for www.google.es failed: timed out.

i5Js@raspiserver:~/K3s/deploy/helm_deploy/metallb$ dig +notcp @192.168.1.122 www.google.es
;; communications error to 192.168.1.122#53: timed out
;; communications error to 192.168.1.122#53: timed out
;; communications error to 192.168.1.122#53: timed out

; <<>> DiG 9.18.12-0ubuntu0.22.04.1-Ubuntu <<>> +notcp @192.168.1.122 www.google.es
; (1 server found)
;; global options: +cmd
;; no servers could be reached

It's clear it's something outside the pod

i5Js@raspiserver:~/K3s/deploy/helm_deploy/metallb$ k exec -it pihole-565b8855c-vpdf5 -n dns-home -c pihole bash
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
root@noldork3s-pihole:/# dig www.google.es

; <<>> DiG 9.16.33-Debian <<>> www.google.es
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 6961
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
; COOKIE: c968c3085b5e195b (echoed)
;; QUESTION SECTION:
;www.google.es.         IN  A

;; ANSWER SECTION:
www.google.es.      104 IN  A   142.250.179.195

;; Query time: 111 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Fri Apr 14 15:25:02 CEST 2023
;; MSG SIZE  rcvd: 83

I run out of ideas....

i5Js commented 1 year ago

I think I got it.

I missed one big detail. I was using kube-vip for HA. THAT is the problem. Honestly I don't know why it doesn't work, but when I have removed id, everything has started to work.

Anyone uses kube-vip and it works? Ideas?

@MoJo2600 I think we can move it to discussions, as suspected, there never was a problem with pihole.

i5Js commented 1 year ago

Fixed!

It was a kube-vip issue. If you use it, do not forget to add the following annotation to the pihole service:

kube-vip.io/ignore=true

After that, kube-vip will ignore the ip:

time="2023-04-16T07:29:18Z" level=info msg="service [pihole-dns-tcp] has an ignore annotation for kube-vip"
time="2023-04-16T07:29:18Z" level=info msg="service [pihole-dns-udp] has an ignore annotation for kube-vip"
time="2023-04-16T07:29:18Z" level=info msg="service [pihole-dns-tcp] has an ignore annotation for kube-vip"
time="2023-04-16T07:29:18Z" level=info msg="service [pihole-dns-udp] has an ignore annotation for kube-vip"
time="2023-04-16T08:23:32Z" level=info msg="service [pihole-dns-tcp] has an ignore annotation for kube-vip"
time="2023-04-16T08:23:32Z" level=info msg="service [pihole-dns-udp] has an ignore annotation for kube-vip"

MoJo2600 / pihole-kubernetes

K8S with MetalLB - unable to access dns server #252