Podman with custom network dns not working

containers / podman

Podman: A tool for managing OCI containers and pods.

https://podman.io

Apache License 2.0

23.72k stars 2.41k forks source link

Podman with custom network dns not working #23957

Closed flixman closed 3 weeks ago

flixman commented 1 month ago

Issue Description

Similarly to this issue, using run I can reach the internet:

podman run --rm -it --name testcontainer <registry>/gitea/runners/podman:extended podman login -u <login> -p <password> <registry>

However, should I create the network separately and then use it, I cannot do it:

podman network create --subnet 10.1.0.0/24 --gateway 10.1.0.1 testnet
podman run --rm -it --network testnet --name testcontainer <registry>/gitea/runners/podman:extended podman login -u <login> -p <password> <registry>

returns Error: authenticating creds for "<registry>": pinging container registry <registry>: Get "https://registry/v2/": dial tcp: lookup <registry>: Temporary failure in name resolution

Steps to reproduce the issue

create the network
run the container attached to that network.

Describe the results you received

The container cannot reach the internet

Describe the results you expected

The container works with a customized network the same it works with the default network.

podman info output

host:
  arch: amd64
  buildahVersion: 1.37.2
  cgroupControllers:
  - cpu
  - memory
  - pids
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-1:2.1.12-1
    path: /usr/bin/conmon
    version: 'conmon version 2.1.12, commit: e8896631295ccb0bfdda4284f1751be19b483264'
  cpuUtilization:
    idlePercent: 96.7
    systemPercent: 1.33
    userPercent: 1.97
  cpus: 16
  databaseBackend: sqlite
  distribution:
    distribution: arch
    version: unknown
  eventLogger: journald
  freeLocks: 2024
  hostname: altair
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
  kernel: 6.10.9-arch1-2
  linkmode: dynamic
  logDriver: journald
  memFree: 3601637376
  memTotal: 15909453824
  networkBackend: netavark
  networkBackendInfo:
    backend: netavark
    dns:
      package: aardvark-dns-1.12.2-1
      path: /usr/lib/podman/aardvark-dns
      version: aardvark-dns 1.12.2
    package: netavark-1.12.2-1
    path: /usr/lib/podman/netavark
    version: netavark 1.12.2
  ociRuntime:
    name: crun
    package: crun-1.17-1
    path: /usr/bin/crun
    version: |-
      crun version 1.17
      commit: 000fa0d4eeed8938301f3bcf8206405315bc1017
      rundir: /run/user/1000/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL
  os: linux
  pasta:
    executable: /usr/bin/pasta
    package: passt-2024_09_06.6b38f07-1
    version: |
      pasta 2024_09_06.6b38f07
      Copyright Red Hat
      GNU General Public License, version 2 or later
        <https://www.gnu.org/licenses/old-licenses/gpl-2.0.html>
      This is free software: you are free to change and redistribute it.
      There is NO WARRANTY, to the extent permitted by law.
  remoteSocket:
    exists: true
    path: /run/user/1000/podman/podman.sock
  rootlessNetworkCmd: pasta
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /etc/containers/seccomp.json
    selinuxEnabled: false
  serviceIsRemote: false
  slirp4netns:
    executable: ""
    package: ""
    version: ""
  swapFree: 0
  swapTotal: 0
  uptime: 22h 55m 6.00s (Approximately 0.92 days)
  variant: ""
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  docker.io:
    Blocked: false
    Insecure: false
    Location: docker.io
    MirrorByDigestOnly: false
    Mirrors:
    - Insecure: false
      Location: <registry>
      PullFromMirror: ""
    Prefix: docker.io
    PullFromMirror: ""
  search:
  - docker.io
store:
  configFile: /home/user/.config/containers/storage.conf
  containerStore:
    number: 2
    paused: 0
    running: 1
    stopped: 1
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /home/user/.local/share/containers/storage
  graphRootAllocated: 500856545280
  graphRootUsed: 246386896896
  graphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Supports shifting: "true"
    Supports volatile: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 89
  runRoot: /run/user/1000/containers
  transientStore: false
  volumePath: /home/user/.local/share/containers/storage/volumes
version:
  APIVersion: 5.2.2
  Built: 1724352649
  BuiltTime: Thu Aug 22 20:50:49 2024
  GitCommit: fcee48106a12dd531702d729d17f40f6e152027f
  GoVersion: go1.23.0
  Os: linux
  OsArch: linux/amd64
  Version: 5.2.2

Podman in a container

Privileged Or Rootless

Rootless

Upstream Latest Release

Additional environment details

No response

Additional information

When using the default network, when it works, I get an ip address on the address space of the host.

Luap99 commented 1 month ago

What does cannot reach the internet mean? Your error shows a problem resolving a dns name, do you actually have no network connectivity or is just dns failing? Does dns/networking work inside podman unshare --rootless-netns?

flixman commented 1 month ago

@Luap99 That is interesting! Let's see:

Running in my container with a custom network created through podman network create --subnet 10.1.0.0/24 --gateway 10.1.0.1 testnet: telnet 10.1.0.1 53, works telnet 8.8.8.8 53, works dig www.google.com @8.8.8.8, works dig www.google.com: error ";; communications error to 10.1.0.1#53: timed out"

Running inside podman unshare --rootless-netns: dig www.google.com, works

How is possible that I can telnet, from inside my container, to port 53... but then dig returns an error??

Luap99 commented 1 month ago

Ok thanks for checking, this means that aardvark-dns is not responding on udp I would guess. telnet uses tcp not udp. You could try to use dig +tcp ... to see if dns works on tcp.

Can you a check that aardvark-dns is running (when you have the container running) and if so please provide the output of podman unshare --rootless-netns ss -tulpn.

flixman commented 1 month ago

dig +tcp ... returns the timeout as well, and aardvark-dns is running. The output of podman unshare --rootless-netns ss -tulpn is:

Netid          State           Recv-Q          Send-Q                   Local Address:Port                     Peer Address:Port          Process                                            
udp            UNCONN          0               0                             10.1.0.1:53                            0.0.0.0:*              users:(("aardvark-dns",pid=38793,fd=12))          
tcp            LISTEN          0               1024                          10.1.0.1:53                            0.0.0.0:*              users:(("aardvark-dns",pid=38793,fd=13))

Additionally: should I attach strace to the running aardvark-dns and its forks, when doing the dig (with eider udp or tcp), I get similar traces:

[pid 38801] accept4(13, {sa_family=AF_INET, sin_port=htons(35163), sin_addr=inet_addr("10.1.0.3")}, [128 => 16], SOCK_CLOEXEC|SOCK_NONBLOCK) = 5
[pid 38801] epoll_ctl(7, EPOLL_CTL_ADD, 5, {events=EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, data={u32=1342190720, u64=140416707998848}}) = 0
[pid 38801] accept4(13, 0x7fb59b5fb9d0, [128], SOCK_CLOEXEC|SOCK_NONBLOCK) = -1 EAGAIN (Resource temporarily unavailable)
[pid 38801] write(6, "\1\0\0\0\0\0\0\0", 8) = 8
[pid 38801] epoll_wait(4, [{events=EPOLLIN|EPOLLOUT, data={u32=1342190720, u64=140416707998848}}, {events=EPOLLIN, data={u32=0, u64=0}}], 1024, 2956) = 2
[pid 38801] recvfrom(5, "\0007", 2, 0, NULL, NULL) = 2
[pid 38801] recvfrom(5, "\260\356\1 \0\1\0\0\0\0\0\1\3www\6google\3com\0\0\1\0\1"..., 55, 0, NULL, NULL) = 55
[pid 38801] socket(AF_INET, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP) = 14
[pid 38801] connect(14, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("169.254.0.1")}, 16) = -1 EINPROGRESS (Operation now in progress)
[pid 38801] epoll_ctl(7, EPOLL_CTL_ADD, 14, {events=EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, data={u32=1946168448, u64=140417311976576}}) = 0
[pid 38801] epoll_wait(4, [], 1024, 3212) = 0
[pid 38801] epoll_wait(4, [], 1024, 1726) = 0
[pid 38801] epoll_wait(4, [], 1024, 59) = 0
[pid 38801] epoll_ctl(7, EPOLL_CTL_DEL, 14, NULL) = 0
[pid 38801] close(14)                   = 0
[pid 38801] socket(AF_INET, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP) = 14
[pid 38801] connect(14, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("192.168.178.4")}, 16) = -1 EINPROGRESS (Operation now in progress)
[pid 38801] epoll_ctl(7, EPOLL_CTL_ADD, 14, {events=EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, data={u32=2348821376, u64=140417714629504}}) = 0
[pid 38801] write(6, "\1\0\0\0\0\0\0\0", 8) = 8
[pid 38801] epoll_wait(4, [{events=EPOLLIN, data={u32=0, u64=0}}], 1024, 2306) = 1
[pid 38801] epoll_wait(4, [], 1024, 2306) = 0
[pid 38801] epoll_wait(4, [], 1024, 2686) = 0
[pid 38801] epoll_wait(4, [{events=EPOLLIN, data={u32=3002696448, u64=99616179192576}}], 1024, 4) = 1
[pid 38801] accept4(13, {sa_family=AF_INET, sin_port=htons(34497), sin_addr=inet_addr("10.1.0.3")}, [128 => 16], SOCK_CLOEXEC|SOCK_NONBLOCK) = 15
[pid 38801] epoll_ctl(7, EPOLL_CTL_ADD, 15, {events=EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, data={u32=1946168448, u64=140417311976576}}) = 0
[pid 38801] accept4(13, 0x7fb59b5fb9d0, [128], SOCK_CLOEXEC|SOCK_NONBLOCK) = -1 EAGAIN (Resource temporarily unavailable)
[pid 38801] epoll_wait(4, [{events=EPOLLIN|EPOLLOUT|EPOLLRDHUP, data={u32=1342190720, u64=140416707998848}}, {events=EPOLLIN|EPOLLOUT, data={u32=1946168448, u64=140417311976576}}], 1024, 3) = 2
[pid 38801] recvfrom(15, "\0007", 2, 0, NULL, NULL) = 2
[pid 38801] recvfrom(15, "%\332\1 \0\1\0\0\0\0\0\1\3www\6google\3com\0\0\1\0\1"..., 55, 0, NULL, NULL) = 55
[pid 38801] socket(AF_INET, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP) = 17
[pid 38801] connect(17, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("169.254.0.1")}, 16) = -1 EINPROGRESS (Operation now in progress)
[pid 38801] epoll_ctl(7, EPOLL_CTL_ADD, 17, {events=EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, data={u32=1946170880, u64=140417311979008}}) = 0
[pid 38801] epoll_wait(4, [], 1024, 2)  = 0
[pid 38801] epoll_ctl(7, EPOLL_CTL_DEL, 14, NULL) = 0
[pid 38801] close(14)                   = 0
[pid 38801] socket(AF_INET, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP) = 14
[pid 38801] connect(14, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("84.116.46.21")}, 16) = -1 EINPROGRESS (Operation now in progress)
[pid 38801] epoll_ctl(7, EPOLL_CTL_ADD, 14, {events=EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, data={u32=1946162816, u64=140417311970944}}) = 0
[pid 38801] epoll_wait(4, [{events=EPOLLOUT, data={u32=1946162816, u64=140417311970944}}], 1024, 1401) = 1
[pid 38801] getsockopt(14, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
[pid 38801] setsockopt(14, SOL_TCP, TCP_NODELAY, [1], 4) = 0
[pid 38801] sendto(14, "\0007", 2, MSG_NOSIGNAL, NULL, 0) = 2
[pid 38801] sendto(14, "^}\1 \0\1\0\0\0\0\0\1\3www\6google\3com\0\0\1\0\1"..., 55, MSG_NOSIGNAL, NULL, 0) = 55
[pid 38801] futex(0x5a99b2f936f8, FUTEX_WAKE_PRIVATE, 1) = 1
[pid 38801] epoll_wait(4,  <unfinished ...>
[pid 38799] <... futex resumed>)        = 0
[pid 38799] futex(0x5a99b2f936f8, FUTEX_WAIT_BITSET_PRIVATE, 12, NULL, FUTEX_BITSET_MATCH_ANY <unfinished ...>
[pid 38801] <... epoll_wait resumed>[{events=EPOLLIN|EPOLLOUT, data={u32=1946162816, u64=140417311970944}}], 1024, 1369) = 1
[pid 38801] recvfrom(14, "\0;", 2, 0, NULL, NULL) = 2
[pid 38801] recvfrom(14, "^}\201\200\0\1\0\1\0\0\0\1\3www\6google\3com\0\0\1\0\1"..., 59, 0, NULL, NULL) = 59
[pid 38801] recvfrom(14, 0x7fb574004cb0, 2, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)

meaning: the request reaches aardvark-dns in both cases, but seems that aardvark-dns is not able to query the DNS itself?

Luap99 commented 1 month ago

Do you have any aardvark-dns errors logged in journald?

The strace part shows a tcp request if I read this right. The async epoll API that we are using makes reading the strace a bit harder but it seems we are trying to connect to upstream servers but then it just removes the fd from the epoll again but I do not see any error logged or any write/read from the socket which seems very odd. Although in the end it seems to succeed when connecting 84.116.46.21 but I guess by that time the original client timed out (ref https://github.com/containers/aardvark-dns/issues/482#issuecomment-2253110977)

What is the content of /etc/resolv.conf on the host and inside podman unshare --rootless-netns? And when you say dig inside podman unshare --rootless-netns worked which upstream server did it use?

flixman commented 1 month ago

In /etc/resolv.conf I have a bunch of name servers, and inside podman unshare --rootless-netns I have the same, but a new one gets prepended to the list nameserver 169.254.0.1. When executing dig www.google.com inside podman unshare --rootless-netns I get three timeouts for 169.254.0.1, and then successfully works for another one (using UDP, by the way).

With the container running, dig www.google.com results on aardvark-dns on the host writting a number of "dns request got empty response" messages on the log.

Luap99 commented 1 month ago

169.254.0.1

This is the special dns forward address we use for pasta so this address is expected to work there. If it doesn't it sounds like pasta bug, if you look in journald do you see a warning from pasta that it didn't find nameservers?

You can also just test from the cli with pasta --config-net --dns-forward 169.254.0.1 dig google.com @169.254.0.1. If this fails this is a pasta bug.

flixman commented 1 month ago

Indeed, it fails:

$ pasta --config-net --dns-forward 169.254.0.1 dig google.com @169.254.0.1
Multiple default IPv4 routes, picked first
Multiple default IPv6 routes, picked first
;; communications error to 169.254.0.1#53: timed out
;; communications error to 169.254.0.1#53: timed out
;; communications error to 169.254.0.1#53: timed out

; <<>> DiG 9.20.1 <<>> google.com @169.254.0.1
;; global options: +cmd
;; no servers could be reached

Luap99 commented 1 month ago

Multiple default IPv4 routes, picked first Multiple default IPv6 routes, picked first

How do the routes look like in the container (pasta --config-net ip route)?, if the routes are fine then you can use the --pcap option to capture a pcap file so we can have a look at the packages being send, i.e. pasta --config-net --pcap /tmp/dns.pcap --dns-forward 169.254.0.1 dig google.com @169.254.0.1

cc @sbrivio-rh @dgibson

flixman commented 1 month ago

The routes seem to be fine:

$ pasta --config-net ip route
Multiple default IPv4 routes, picked first
Multiple default IPv6 routes, picked first
default via 192.168.178.1 dev wlp2s0 proto dhcp metric 600 
84.116.46.20 via 192.168.178.1 dev wlp2s0 proto dhcp metric 600 
84.116.46.21 via 192.168.178.1 dev wlp2s0 proto dhcp metric 600 
192.168.178.0/24 dev wlp2s0 proto kernel scope link metric 600 
192.168.178.0/24 dev wlp2s0 proto kernel scope link src 192.168.178.129 metric 600 
192.168.178.1 dev wlp2s0 proto dhcp scope link metric 600

Please, find attached the trace dns.pcap.txt (remove the .txt suffix. Seems GH does not support .pcap):

Luap99 commented 1 month ago

Please, find attached the trace dns.pcap.txt (remove the .txt suffix. Seems GH does not support .pcap):

    4   0.007273 192.168.178.129 → 169.254.0.1  DNS 93 Standard query 0xaab5 A google.com OPT
   12   5.012949 192.168.178.129 → 169.254.0.1  DNS 93 Standard query 0xaab5 A google.com OPT
   13  10.018316 192.168.178.129 → 169.254.0.1  DNS 93 Standard query 0xaab5 A google.com OPT

The requests was send out but never a reply, can you also do a packet capture on the host to see if pasta makes a actual requests to the upstream server there or if pasta eats it internally and never forwards. I wonder is pasta somehow failed to parse resolv.conf for the servers but in this case it should it should print this as warning like the "multiple default routes" warning. There is also --debug pasta option which also logs the internal packet flow so maybe there is something interesting in there.

But I guess at this point I have to leave it to @sbrivio-rh and @dgibson (the pasta maintainers) if they have a clue here.

flixman commented 1 month ago

@sbrivio-rh @dgibson: I have run again the pasta command with the --debug option. Can you guys give me a hand?

$ pasta --debug --config-net --dns-forward 169.254.0.1 dig google.com @169.254.0.1
0.0010: Multiple default IPv4 routes, picked first
0.0010: Multiple default IPv6 routes, picked first
0.0118: Template interface: wlp2s0 (IPv4), wlp2s0 (IPv6)
0.0118: Namespace interface: wlp2s0
0.0118: MAC:
0.0118:     host: 9a:55:9a:55:9a:55
0.0118:     NAT to host 127.0.0.1: 192.168.178.1
0.0118: DHCP:
0.0119:     assign: 192.168.178.129
0.0119:     mask: 255.255.255.0
0.0119:     router: 192.168.178.1
0.0119: DNS:
0.0119:     192.168.178.4
0.0119:     84.116.46.21
0.0119:     84.116.46.20
0.0119:     84.116.46.21
0.0119:     169.254.0.1
0.0119:     192.168.178.1
0.0119:     192.168.178.4
0.0119: DNS search list:
0.0119:     .
0.0119:     NAT to host ::1: fe80::4ad3:43ff:feda:bb88
0.0119: NDP/DHCPv6:
0.0120:     assign: 2001:1c00:1804:b700:f2b3:fadc:4fa3:f578
0.0120:     router: fe80::4ad3:43ff:feda:bb88
0.0120:     our link-local: fe80::4ad3:43ff:feda:bb88
0.0120: DNS:
0.0120:     2001:b88:1002::10
0.0120:     2001:b88:1202::10
0.0120:     2001:730:3e42:1000::53
0.0120:     2001:b88:1002::10
0.0120: DNS search list:
0.0120:     .
0.0186: SO_PEEK_OFF not supported
0.0305: Flow 0 (NEW): FREE -> NEW
0.0305: Flow 0 (INI): NEW -> INI
0.0305: Flow 0 (INI): TAP [192.168.178.129]:39909 -> [169.254.0.1]:53 => ?
0.0306: Flow 0 (TGT): INI -> TGT
0.0306: Flow 0 (TGT): TAP [192.168.178.129]:39909 -> [169.254.0.1]:53 => HOST [0.0.0.0]:39909 -> [192.168.178.4]:53
0.0306: Flow 0 (UDP flow): TGT -> TYPED
0.0306: Flow 0 (UDP flow): TAP [192.168.178.129]:39909 -> [169.254.0.1]:53 => HOST [0.0.0.0]:39909 -> [192.168.178.4]:53
0.0308: Flow 0 (UDP flow): Side 0 hash table insert: bucket: 41306
0.0308: Flow 0 (UDP flow): TYPED -> ACTIVE
0.0308: Flow 0 (UDP flow): TAP [192.168.178.129]:39909 -> [169.254.0.1]:53 => HOST [0.0.0.0]:39909 -> [192.168.178.4]:53
0.0487: ICMP error on UDP socket 179: No route to host
;; communications error to 169.254.0.1#53: timed out
5.0351: Flow 1 (NEW): FREE -> NEW
5.0351: Flow 1 (INI): NEW -> INI
5.0351: Flow 1 (INI): TAP [192.168.178.129]:57747 -> [169.254.0.1]:53 => ?
5.0351: Flow 1 (TGT): INI -> TGT
5.0352: Flow 1 (TGT): TAP [192.168.178.129]:57747 -> [169.254.0.1]:53 => HOST [0.0.0.0]:57747 -> [192.168.178.4]:53
5.0352: Flow 1 (UDP flow): TGT -> TYPED
5.0352: Flow 1 (UDP flow): TAP [192.168.178.129]:57747 -> [169.254.0.1]:53 => HOST [0.0.0.0]:57747 -> [192.168.178.4]:53
5.0353: Flow 1 (UDP flow): Side 0 hash table insert: bucket: 235154
5.0353: Flow 1 (UDP flow): TYPED -> ACTIVE
5.0353: Flow 1 (UDP flow): TAP [192.168.178.129]:57747 -> [169.254.0.1]:53 => HOST [0.0.0.0]:57747 -> [192.168.178.4]:53
5.0498: ICMP error on UDP socket 244: No route to host
;; communications error to 169.254.0.1#53: timed out
10.0406: Flow 2 (NEW): FREE -> NEW
10.0406: Flow 2 (INI): NEW -> INI
10.0407: Flow 2 (INI): TAP [192.168.178.129]:59697 -> [169.254.0.1]:53 => ?
10.0407: Flow 2 (TGT): INI -> TGT
10.0407: Flow 2 (TGT): TAP [192.168.178.129]:59697 -> [169.254.0.1]:53 => HOST [0.0.0.0]:59697 -> [192.168.178.4]:53
10.0407: Flow 2 (UDP flow): TGT -> TYPED
10.0407: Flow 2 (UDP flow): TAP [192.168.178.129]:59697 -> [169.254.0.1]:53 => HOST [0.0.0.0]:59697 -> [192.168.178.4]:53
10.0408: Flow 2 (UDP flow): Side 0 hash table insert: bucket: 10518
10.0408: Flow 2 (UDP flow): TYPED -> ACTIVE
10.0408: Flow 2 (UDP flow): TAP [192.168.178.129]:59697 -> [169.254.0.1]:53 => HOST [0.0.0.0]:59697 -> [192.168.178.4]:53
10.0590: ICMP error on UDP socket 245: No route to host
;; communications error to 169.254.0.1#53: timed out

; <<>> DiG 9.20.1 <<>> google.com @169.254.0.1
;; global options: +cmd
;; no servers could be reached

sbrivio-rh commented 1 month ago

I was looking into this right now. Quick question: is 2001:b88:1002::10 a valid resolver? What happens if you dig passt.top @2001:b88:1002::10?

sbrivio-rh commented 1 month ago

Same for 192.168.178.4: does it work?

flixman commented 1 month ago

Thank you for your help! Yes, 192.168.178.4 is valid. About the "dig passt.top @2001:b88:1002::10", this also seems to work:

$ pasta --config-net --dns-forward 169.254.0.1 dig passt.top @2001:b88:1002::10
Multiple default IPv4 routes, picked first
Multiple default IPv6 routes, picked first

; <<>> DiG 9.20.1 <<>> passt.top @2001:b88:1002::10
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 40536
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;passt.top.                     IN      A

;; ANSWER SECTION:
passt.top.              300     IN      A       88.198.0.164

;; Query time: 60 msec
;; SERVER: 2001:b88:1002::10#53(2001:b88:1002::10) (UDP)
;; WHEN: Wed Sep 18 20:45:38 CEST 2024
;; MSG SIZE  rcvd: 54

sbrivio-rh commented 1 month ago

Weird, because when pasta (and not a process running under pasta) tries to contact 192.168.178.4, it gets an error ("No route to host"). That might be an ICMP error or netfilter (nftables or iptables) blocking it.

How do routes look like on the host (not the ones pasta copies)? Any particular firewalling rule pasta could hit?

flixman commented 1 month ago

with respect to the routes on the host, this is how they look like:

$ ip route
default via 192.168.178.1 dev wlp2s0 proto dhcp src 192.168.178.129 metric 600 
default via 192.168.178.1 dev eno1 proto dhcp src 192.168.178.213 metric 800 
84.116.46.20 via 192.168.178.1 dev wlp2s0 proto dhcp src 192.168.178.129 metric 600 
84.116.46.20 via 192.168.178.1 dev eno1 proto dhcp src 192.168.178.213 metric 800 
84.116.46.21 via 192.168.178.1 dev wlp2s0 proto dhcp src 192.168.178.129 metric 600 
84.116.46.21 via 192.168.178.1 dev eno1 proto dhcp src 192.168.178.213 metric 800 
192.168.178.0/24 dev wlp2s0 proto kernel scope link src 192.168.178.129 metric 600 
192.168.178.0/24 dev eno1 proto kernel scope link src 192.168.178.213 metric 800 
192.168.178.1 dev wlp2s0 proto dhcp scope link src 192.168.178.129 metric 600 
192.168.178.1 dev eno1 proto dhcp scope link src 192.168.178.213 metric 800

and about the firewall settings, I do not have any rules in nft that can justify this behavior: nft_rules.txt

dgibson commented 1 month ago

Well, I'm deeply baffled. You're able to manually contact the DNS server from the host, but when pasta tries it gets an ICMP error. We could try to get a packet capture on the host - perhaps that would shed some more light on where the error is originating. In fact, even better would be to get two different packet traces on the host: one querying the nameserver directly from the host with dig, the second doing a similar query from the container via pasta. Perhaps we'll see some difference that helps explain things.

sbrivio-rh commented 1 month ago

...or maybe it has something to do with us bind()ing and connect()ing UDP sockets (dig doesn't do that) when two sets of almost-identical routes (metrics, interface, and source differ) are present?

I can try and see if it can be reproduced with a dummy interface with similar routes.

dgibson commented 1 month ago

...or maybe it has something to do with us bind()ing and connect()ing UDP sockets (dig doesn't do that)

Doesn't bind() or doesn't connect()? I'm pretty sure it has to do one of them in order to receive anything at all.

when two sets of almost-identical routes (metrics, interface, and source differ) are present?

I can try and see if it can be reproduced with a dummy interface with similar routes.

sbrivio-rh commented 1 month ago

...or maybe it has something to do with us bind()ing and connect()ing UDP sockets (dig doesn't do that)

Doesn't bind() or doesn't connect()? I'm pretty sure it has to do one of them in order to receive anything at all.

Whoops, sorry, I just assumed. It actually does both:

$ strace -e connect,bind dig root-servers.net @1.1.1.1 >/dev/null
bind(11, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("0.0.0.0")}, 16) = 0
connect(11, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("1.1.1.1")}, 16) = 0
+++ exited with 0 +++

but it bind()s to 0.0.0.0, port 0, so that's not quite the bind()ing we do.

dgibson commented 1 month ago

I think binding to 0.0.0.0:0 is basically a no-op. Which means I thnk the kernel will implicitly bind the socket at connect() time to an address and port of the kernel's choosing.

flixman commented 1 month ago

@dgibson Sorry for not answering yesterday, had a pretty busy day. About your request "[...] one querying the nameserver directly from the host with dig, the second doing a similar query from the container via pasta. [...]": Do you mean something like the trace I provided on this comment, for the host?

Besides this: I have just updated the system and rebooted it. I have also disabled the DNS server I had in 192.168.178.4 to use the one provided by my router (192.168.178.1), and I have removed one of the interfaces. None of this has helped in solving this issue :-/ (but I have a cleaner system, I guess xD). These are the results:

$ podman unshare --rootless-netns more /etc/resolv.conf 
nameserver 169.254.0.1
nameserver 192.168.178.1
nameserver 84.116.46.21
nameserver 84.116.46.20
nameserver 2001:b88:1002::10
nameserver 2001:b88:1202::10
nameserver 2001:730:3e42:1000::53

$ podman unshare --rootless-netns ip add
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host proto kernel_lo 
       valid_lft forever preferred_lft forever
2: wlp2s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65520 qdisc fq_codel state UNKNOWN group default qlen 1000
    link/ether ee:65:d6:01:89:0d brd ff:ff:ff:ff:ff:ff
    inet 192.168.178.129/24 metric 600 brd 192.168.178.255 scope global wlp2s0
       valid_lft forever preferred_lft forever
    inet6 2001:1c00:1804:b700:3e55:76ff:fe0f:c901/64 scope global nodad mngtmpaddr noprefixroute 
       valid_lft forever preferred_lft forever
    inet6 2001:1c00:1804:b700:1034:e18c:123f:2add/64 scope global nodad 
       valid_lft forever preferred_lft forever
    inet6 fe80::ec65:d6ff:fe01:890d/64 scope link nodad tentative proto kernel_ll 
       valid_lft forever preferred_lft forever

$ podman unshare --rootless-netns ip route
default via 192.168.178.1 dev wlp2s0 proto dhcp metric 600 
84.116.46.20 via 192.168.178.1 dev wlp2s0 proto dhcp metric 600 
84.116.46.21 via 192.168.178.1 dev wlp2s0 proto dhcp metric 600 
192.168.178.0/24 dev wlp2s0 proto kernel scope link metric 600 
192.168.178.0/24 dev wlp2s0 proto kernel scope link src 192.168.178.129 metric 600 
192.168.178.1 dev wlp2s0 proto dhcp scope link metric 600 

$ podman unshare --rootless-netns dig passt.top
;; communications error to 169.254.0.1#53: timed out
;; communications error to 169.254.0.1#53: timed out
;; communications error to 169.254.0.1#53: timed out
;; communications error to 192.168.178.1#53: timed out

; <<>> DiG 9.20.2 <<>> passt.top
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 45940
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;passt.top.                     IN      A

;; ANSWER SECTION:
passt.top.              300     IN      A       88.198.0.164

;; Query time: 33 msec
;; SERVER: 84.116.46.21#53(84.116.46.21) (UDP)
;; WHEN: Sat Sep 21 10:44:18 CEST 2024
;; MSG SIZE  rcvd: 54

$ podman unshare --rootless-netns dig passt.top @192.168.178.1
;; communications error to 192.168.178.1#53: timed out
;; communications error to 192.168.178.1#53: timed out
;; communications error to 192.168.178.1#53: timed out

; <<>> DiG 9.20.2 <<>> passt.top @192.168.178.1
;; global options: +cmd
;; no servers could be reached

$ podman unshare --rootless-netns dig passt.top @84.116.46.21 

; <<>> DiG 9.20.2 <<>> passt.top @84.116.46.21
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 9750
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;passt.top.                     IN      A

;; ANSWER SECTION:
passt.top.              300     IN      A       88.198.0.164

;; Query time: 40 msec
;; SERVER: 84.116.46.21#53(84.116.46.21) (UDP)
;; WHEN: Sat Sep 21 10:45:11 CEST 2024
;; MSG SIZE  rcvd: 54

$ podman unshare --rootless-netns dig passt.top @192.168.178.1 +tcp
;; Connection to 192.168.178.1#53(192.168.178.1) for passt.top failed: timed out.
;; no servers could be reached

So: seems that the problem is not related to tcp/udp connections, it works with a remote DNS server but not with that of my router (I have also rebooted the router). Should I resolve querying 84.116.46.21 from the host, this works... but not when setting it as a dns forward resolver in pasta:

$ dig google.com @84.116.46.21

; <<>> DiG 9.20.2 <<>> google.com @84.116.46.21
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 719
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;google.com.                    IN      A

;; ANSWER SECTION:
google.com.             63      IN      A       142.250.179.174

;; Query time: 33 msec
;; SERVER: 84.116.46.21#53(84.116.46.21) (UDP)
;; WHEN: Sat Sep 21 10:49:29 CEST 2024
;; MSG SIZE  rcvd: 55

$ pasta --config-net --dns-forward 84.116.46.21 dig google.com @169.254.0.1
;; communications error to 169.254.0.1#53: timed out
;; communications error to 169.254.0.1#53: timed out
;; communications error to 169.254.0.1#53: timed out

; <<>> DiG 9.20.2 <<>> google.com @169.254.0.1
;; global options: +cmd
;; no servers could be reached

dgibson commented 1 month ago

@dgibson Sorry for not answering yesterday, had a pretty busy day. About your request "[...] one querying the nameserver directly from the host with dig, the second doing a similar query from the container via pasta. [...]": Do you mean something like the trace I provided on this comment, for the host?

Roughly, yes. The most important thing is getting the trace from the host, not from the container or pasta as that earlier trace was. But then it would also be useful to see the difference in trace between running dig directly on the host, and running dig within the container.

Besides this: I have just updated the system and rebooted it. I have also disabled the DNS server I had in 192.168.178.4 to use the one provided by my router (192.168.178.1), and I have removed one of the interfaces. None of this has helped in solving this issue :-/ (but I have a cleaner system, I guess xD). These are the results:

Right. I wouldn't particularly expect those changes to make any difference here... but then the symptoms we're seeing are so weird, I don't really know for sure.

$ podman unshare --rootless-netns more /etc/resolv.conf 
nameserver 169.254.0.1
nameserver 192.168.178.1
nameserver 84.116.46.21
nameserver 84.116.46.20
nameserver 2001:b88:1002::10
nameserver 2001:b88:1202::10
nameserver 2001:730:3e42:1000::53

$ podman unshare --rootless-netns ip add
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host proto kernel_lo 
       valid_lft forever preferred_lft forever
2: wlp2s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65520 qdisc fq_codel state UNKNOWN group default qlen 1000
    link/ether ee:65:d6:01:89:0d brd ff:ff:ff:ff:ff:ff
    inet 192.168.178.129/24 metric 600 brd 192.168.178.255 scope global wlp2s0
       valid_lft forever preferred_lft forever
    inet6 2001:1c00:1804:b700:3e55:76ff:fe0f:c901/64 scope global nodad mngtmpaddr noprefixroute 
       valid_lft forever preferred_lft forever
    inet6 2001:1c00:1804:b700:1034:e18c:123f:2add/64 scope global nodad 
       valid_lft forever preferred_lft forever
    inet6 fe80::ec65:d6ff:fe01:890d/64 scope link nodad tentative proto kernel_ll 
       valid_lft forever preferred_lft forever

$ podman unshare --rootless-netns ip route
default via 192.168.178.1 dev wlp2s0 proto dhcp metric 600 
84.116.46.20 via 192.168.178.1 dev wlp2s0 proto dhcp metric 600 
84.116.46.21 via 192.168.178.1 dev wlp2s0 proto dhcp metric 600 
192.168.178.0/24 dev wlp2s0 proto kernel scope link metric 600 
192.168.178.0/24 dev wlp2s0 proto kernel scope link src 192.168.178.129 metric 600 
192.168.178.1 dev wlp2s0 proto dhcp scope link metric 600 

$ podman unshare --rootless-netns dig passt.top
;; communications error to 169.254.0.1#53: timed out
;; communications error to 169.254.0.1#53: timed out
;; communications error to 169.254.0.1#53: timed out
;; communications error to 192.168.178.1#53: timed out

; <<>> DiG 9.20.2 <<>> passt.top
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 45940
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;passt.top.                     IN      A

;; ANSWER SECTION:
passt.top.              300     IN      A       88.198.0.164

;; Query time: 33 msec
;; SERVER: 84.116.46.21#53(84.116.46.21) (UDP)
;; WHEN: Sat Sep 21 10:44:18 CEST 2024
;; MSG SIZE  rcvd: 54

$ podman unshare --rootless-netns dig passt.top @192.168.178.1
;; communications error to 192.168.178.1#53: timed out
;; communications error to 192.168.178.1#53: timed out
;; communications error to 192.168.178.1#53: timed out

; <<>> DiG 9.20.2 <<>> passt.top @192.168.178.1
;; global options: +cmd
;; no servers could be reached

$ podman unshare --rootless-netns dig passt.top @84.116.46.21 

; <<>> DiG 9.20.2 <<>> passt.top @84.116.46.21
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 9750
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;passt.top.                     IN      A

;; ANSWER SECTION:
passt.top.              300     IN      A       88.198.0.164

;; Query time: 40 msec
;; SERVER: 84.116.46.21#53(84.116.46.21) (UDP)
;; WHEN: Sat Sep 21 10:45:11 CEST 2024
;; MSG SIZE  rcvd: 54

$ podman unshare --rootless-netns dig passt.top @192.168.178.1 +tcp
;; Connection to 192.168.178.1#53(192.168.178.1) for passt.top failed: timed out.
;; no servers could be reached

So: seems that the problem is not related to tcp/udp connections, it works with a remote DNS server but not with that of my router (I have also rebooted the router).

That's what's so odd. The queries are failing if they're both to the local nameserver and from pasta. Other combinations appear to be working.

Should I resolve querying 84.116.46.21 from the host, this works... but not when setting it as a dns forward resolver in pasta:

Actually, this one makes sense.

$ pasta --config-net --dns-forward 84.116.46.21 dig google.com @169.254.0.1

--dns-forward sets the address pasta forwards from, not the address it forwards to. So with this setting, pasta is no longer forwarding queries @169.254.0.1, so a timeout is expected.

pasta has an internal concept of the "host DNS" which is where it directs queries to once it's forwarded them. But... it looks like the only way to configure that is via the host's resolv.conf, which is a bit of an oversight. @sbrivio-rh , did I miss something?

;; communications error to 169.254.0.1#53: timed out
;; communications error to 169.254.0.1#53: timed out
;; communications error to 169.254.0.1#53: timed out

; <<>> DiG 9.20.2 <<>> google.com @169.254.0.1
;; global options: +cmd
;; no servers could be reached

flixman commented 1 month ago

@dgibson seems I have gotten somewhere now. If I remove the IP of my router from the resolv.conf, it works:


; <<>> DiG 9.20.2 <<>> google.com @169.254.0.1
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 49114
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;google.com.                    IN      A

;; ANSWER SECTION:
google.com.             112     IN      A       142.250.179.174

;; Query time: 20 msec
;; SERVER: 169.254.0.1#53(169.254.0.1) (UDP)
;; WHEN: Mon Sep 23 18:11:46 CEST 2024
;; MSG SIZE  rcvd: 55

Seems that there is something broken on the DNS cache my router maintains. The only explanation I can find is that pasta was using the first IP that was being found, which was not working, and then it was not trying further (there are two other 84.116.* ips there)?

dgibson commented 1 month ago

@dgibson seems I have gotten somewhere now. If I remove the IP of my router from the resolv.conf, it works:
; <<>> DiG 9.20.2 <<>> google.com @169.254.0.1
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 49114
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;google.com.                    IN      A

;; ANSWER SECTION:
google.com.             112     IN      A       142.250.179.174

;; Query time: 20 msec
;; SERVER: 169.254.0.1#53(169.254.0.1) (UDP)
;; WHEN: Mon Sep 23 18:11:46 CEST 2024
;; MSG SIZE  rcvd: 55
Seems that there is something broken on the DNS cache my router maintains. The only explanation I can find is that pasta was using the first IP that was being found, which was not working, and then it was not trying further (there are two other 84.116.* ips there)?

Yes, pasta forwards all queries to the first host-side resolv.conf entry, fall back to other servers isn't implemented there. It would actually be quite hard to do: we're forwarding at the packet level, and to interpret fallback we'd need to actually interpret what the queries mean to some extent, which we don't really want to do.

What I'm still baffled by is that you seemed to be able to query your router as nameserver from the host, but it failed via pasta. I'm not sure what difference could cause that.

sbrivio-rh commented 1 month ago

I tried to reproduce this (same sets of routes, same addresses, with help of a dummy device) in a network namespace with pasta --config-net, but everything works. I'll try in a VM next.

My focus is on why you'd get host unreachable with pasta for whatever DNS server while it's reachable by dig itself.

sbrivio-rh commented 1 month ago

pasta has an internal concept of the "host DNS" which is where it directs queries to once it's forwarded them. But... it looks like the only way to configure that is via the host's resolv.conf, which is a bit of an oversight. @sbrivio-rh , did I miss something?

It's also possible to configure that using --dns / -D. The problem is that it stopped working recently, it seems. If I try:

./pasta -f --config-net -D 185.12.64.1 --dns-forward 5.5.5.5

where 185.12.64.1 is the first resolver address I have in /etc/resolv.conf, a dig google.com @5.5.5.5 becomes:

bind(211, {sa_family=AF_INET, sin_port=htons(59628), sin_addr=inet_addr("0.0.0.0")}, 16) = 0
epoll_ctl(3, EPOLL_CTL_ADD, 211, {events=EPOLLIN, data={u32=54022, u64=4295021318}}) = 0
connect(211, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("0.0.0.0")}, 16) = 0
recvmmsg(211, 0x7ffe7ce2e9c0, 1024, MSG_DONTWAIT, NULL) = -1 EAGAIN (Resource temporarily unavailable)
sendmmsg(211, [{msg_hdr={msg_name={sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("0.0.0.0")}, msg_namelen=16, msg_iov=[{iov_base="2z\1 \0\1\0\0\0\0\0\1\6google\3com\0\0\1\0\1\0\0)\4"..., iov_len=51}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, msg_len=51}], 1, MSG_NOSIGNAL) = 1

that is, for some reason, --dns-forward maps things to an unspecified address if that's overridden by -D.

sbrivio-rh commented 1 month ago

Ah, yes, that stopped working (intentionally) with commit 0b25cac94eca ("conf: Treat --dns addresses as guest visible addresses").

I see your reasons there, but it's fairly problematic that we can't override DNS resolvers for --dns-forward with -D. I think we need to either implement another option (say, --host-dns or --dns-host) or revert that commit.

sbrivio-rh commented 1 month ago

I see your reasons there, but it's fairly problematic that we can't override DNS resolvers for --dns-forward with -D. I think we need to either implement another option (say, --host-dns or --dns-host) or revert that commit.

This is implemented by https://archives.passt.top/passt-dev/20241003051402.2548424-1-david@gibson.dropbear.id.au/ by the way.

@flixman, as I can't reproduce this in a nested namespace, before trying to build something that looks like your setup and your router with VMs: could you capture DNS queries and responses (say, tcpdump -nvi eth0 port 53 -s0 -w dns.pcap) on the upstream interface of the host, with your router address back into /etc/resolv.conf, while trying one (successful) query using dig and one (failing) from the container with pasta?

I'm trying to find out if for whatever reason dig gets an answer from another server, which is not your router, while pasta doesn't try further resolvers so it won't.

flixman commented 1 month ago

hey @sbrivio-rh my apologies for these last 10 days without giving live signals, we are going through a reorg here and everything is a bit chaotic. Give me a couple of days and I will try to reproduce this. Thank you!

flixman commented 3 weeks ago

@sbrivio-rh I have run the tests you requested, and here you have the traces: working: dig google.com dns_working.pcap.txt failing: podman unshare --rootless-netns dig google.com dns_failing.pcap.txt

dgibson commented 3 weeks ago

@flixman thanks for the traces. Based on these it's actually looking like this might be a lot less mysterious than we thought.

The working trace shows a number of queries going to the home gateway 192.168.178.1, without response, then a query and response to 84.116.46.21, presumably a result of dig falling back to the next nameserver in the list. The failing trace is almost identical.

So my working theory is simply that DNS resolution never worked on the gateway, but while dig on the host was able to fall back to other nameservers, that doesn't happen under pasta. We're only listing the single virtual DNS resolver within the container, so dig itself can't fall back, and pasta can't fall back to secondary servers without a much more detailed understanding of what's going on with the queries than it possesses.

We can test this, by forcing dig on the host to only use the local gateway:

$ dig www.google.com @192.168.178.1

My expectation is that this will fail, much like dig inside pasta was failing. Failing on the host it may give a more meaningful error message: at present we don't propagate UDP errors seem on the host to ICMP errors that the guest can see. That should be possible for at least some cases, and we'd like to do it, but it's a non-trivial job so probably won't happen soon.

Luap99 commented 3 weeks ago

We're only listing the single virtual DNS resolver within the container, so dig itself can't fall back, and pasta can't fall back to secondary servers without a much more detailed understanding of what's going on with the queries than it possesses.

FYI, that is not true. We add other host resolvers to the container as well so fallback should be possible in theory for any client.

$ podman run --rm quay.io/libpod/testimage:20240123 cat /etc/resolv.conf 
nameserver 169.254.1.1
nameserver 192.168.188.1 <--my host resolver

What is different here as mentioned in the original report if we use custom networks because they will use aardvark-dns by default, in that case ONLY the aardvark-dns ip will be in resolv.conf so no client can perform the retry. And while aardvark-dns did do a retry what it failed to do so in the past is to properly adjust the timeouts, so if the client timeout is 5s we also had a 5s timeout when we did forward the request. This meant that even though aardvark-dns tries further resolvers by the time we finally got the answer and try to respond to the client the client will long have closed it socket an gave up. This was fixed in https://github.com/containers/aardvark-dns/pull/514 (not in any release yet)

So if the issue is really the the first nameserver in resolv.conf is not working then I would say this is expected fow now until you have the new aardvark-dns version

flixman commented 3 weeks ago

@dgibson Indeed, it fails. Seems everything is clear, then. Thank you very much!

dgibson commented 3 weeks ago

@Luap99 sorry, didn't mean to imply that multiple container nameservers is impossible with podman. I'd been under the impression that only one was listed in this particular setup, although looking back, I'm not sure that's correct either. But in any case, it seems like the issue is explained.

@flixman fwiw, we recently added a --dns-host option to pasta to allow controlling where it forwards DNS queries on the host side, overriding the host's /etc/resolv.conf. Once that reaches a packaged version it might be useful to you. Then again, since resolution on your router seems to simply not work, it's probably best to remove it from the host's resolv.conf anyway (or else reconfigure the router so it does work).