containers / podman

Podman: A tool for managing OCI containers and pods.
https://podman.io
Apache License 2.0
23.56k stars 2.39k forks source link

Resolving peer pod DNS names often hangs for a long time in user mode #15972

Closed martinpitt closed 6 months ago

martinpitt commented 2 years ago

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind bug

Description

Our service has a controller container which spawns arbitrarily many "session-123UUID" instance containers. To talk to the latter, it relies on DNS to figure out their addresses. When running in user mode, it often fails to resolve the names, and worse, it often hangs up to a minute (!) before returning the failure.

To illustrate that, I wrote a little reproducer shell script (please drop the .txt suffix, but GitHub does not allow uploading a .sh). It creates a "test" network, and inside a "test-ctrl" container which tries to resolve DNS names of peer containers. The script then creates 5 "test-i" instance containers, waits for a second to allow podman to settle down (just in case), and then tries to resolve the names.

Describe the results you received:

Sometimes, resolving a pod name fails:

+ echo 'resolving test-2 in test-ctrl...'
+ podman exec -i test-ctrl python3
gethostbyname took 0.18181109428405762 seconds
Traceback (most recent call last):
  File "<stdin>", line 4, in <module>
socket.gaierror: [Errno -2] Name or service not known

and then further on, it hangs for a long time even:

resolving test-3 in test-ctrl...
+ podman exec -i test-ctrl python3
gethostbyname took 56.06182265281677 seconds

Describe the results you expected:

Sometimes all is well, and all 5 names resolve quickly and fast, like this:

+ podman exec -i test-ctrl python3
10.89.0.149
gethostbyname took 0.004895925521850586 seconds

Additional information you deem important (e.g. issue happens only occasionally):

It always seems to work fine when running this as root, so this seems related to netavark?

Output of podman version:

Client:       Podman Engine
Version:      4.2.1
API Version:  4.2.1
Go Version:   go1.19
Built:        Wed Sep  7 21:57:53 2022
OS/Arch:      linux/amd64

Output of podman info:

host:
  arch: amd64
  buildahVersion: 1.27.0
  cgroupControllers:
  - memory
  - pids
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.1.3-1.fc37.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.3, commit: '
  cpuUtilization:
    idlePercent: 95.57
    systemPercent: 0.96
    userPercent: 3.47
  cpus: 8
  distribution:
    distribution: fedora
    version: "37"
  eventLogger: journald
  hostname: abakus
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
  kernel: 5.19.10-300.fc37.x86_64
  linkmode: dynamic
  logDriver: journald
  memFree: 5785468928
  memTotal: 16535416832
  networkBackend: netavark
  ociRuntime:
    name: crun
    package: crun-1.6-2.fc37.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 1.6
      commit: 18cf2efbb8feb2b2f20e316520e0fd0b6c41ef4d
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +LIBKRUN +YAJL
  os: linux
  remoteSocket:
    exists: true
    path: /run/user/1000/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.2.0-5.fc37.x86_64
    version: |-
      slirp4netns version 1.2.0
      commit: 656041d45cfca7a4176f6b7eed9e4fe6c11e8383
      libslirp: 4.7.0
      SLIRP_CONFIG_VERSION_MAX: 4
      libseccomp: 2.5.3
  swapFree: 0
  swapTotal: 0
  uptime: 7h 42m 12.00s (Approximately 0.29 days)
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  volume:
  - local
registries:
  search:
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - docker.io
  - quay.io
store:
  configFile: /var/home/martin/.config/containers/storage.conf
  containerStore:
    number: 9
    paused: 0
    running: 8
    stopped: 1
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /home/martin/.local/share/containers/storage
  graphRootAllocated: 228501188608
  graphRootUsed: 189984272384
  graphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 43
  runRoot: /run/user/1000/containers
  volumePath: /home/martin/.local/share/containers/storage/volumes
version:
  APIVersion: 4.2.1
  Built: 1662580673
  BuiltTime: Wed Sep  7 21:57:53 2022
  GitCommit: ""
  GoVersion: go1.19
  Os: linux
  OsArch: linux/amd64
  Version: 4.2.1

Package info (e.g. output of rpm -q podman or apt list podman):

podman-4.2.1-2.fc37.x86_64

Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide? (https://github.com/containers/podman/blob/main/troubleshooting.md)

No

Additional environment details (AWS, VirtualBox, physical, etc.):

Host is a physical Lenovo X1 x86_64 laptop running Fedora 37.

mheon commented 2 years ago

Seems like a potential performance issue in Aardvark? @flouthoc PTAL

flouthoc commented 2 years ago

Could you please let me know the netavark and aardvark versions.

I am quite sure the hang issue will be fixed by https://github.com/containers/aardvark-dns/pull/220 and https://github.com/containers/netavark/pull/375. Fix for failure of looks ups on startup was attempted here: https://github.com/containers/aardvark-dns/pull/148 and https://github.com/containers/netavark/pull/308

All of these patches will be available in upcoming release.

martinpitt commented 2 years ago

Sure! rpm -qa|grep vark says aardvark-dns-1.1.0-1.fc37.x86_64 and netavark-1.1.0-1.fc37.x86_64

defanator commented 2 years ago

Observing similar issues here with podman 4.2.1 on Fedora 36 (x86_64) on AWS. We are running a number of tests involving lots of requests to Ubuntu and Debian repositories, and the tests sporadically fail with something like:

E: Failed to fetch http://archive.ubuntu.com/ubuntu/pool/universe/a/apt/apt-transport-https_1.6.14_all.deb  Temporary failure resolving 'archive.ubuntu.com'
E: Unable to fetch some archives, maybe run apt-get update or try with --fix-missing?

Here are a few pcaps captured on the host instance that correspond to the failures:

10:35:37.467387 eth0  Out IP (tos 0x0, ttl 62, id 38756, offset 0, flags [DF], proto UDP (17), length 75)
    10.212.32.183.2794 > 10.212.32.2.domain: [bad udp cksum 0x56a9 -> 0x9f77!] 30815+ A? archive.ubuntu.com.dns.podman. (47)
10:35:37.467399 eth0  Out IP (tos 0x0, ttl 62, id 38757, offset 0, flags [DF], proto UDP (17), length 75)
    10.212.32.183.2794 > 10.212.32.2.domain: [bad udp cksum 0x56a9 -> 0x1d7a!] 57180+ AAAA? archive.ubuntu.com.dns.podman. (47)
10:35:37.467641 eth0  In  IP (tos 0x0, ttl 255, id 0, offset 0, flags [DF], proto UDP (17), length 150)
    10.212.32.2.domain > 10.212.32.183.2794: [udp sum ok] 30815 NXDomain q: A? archive.ubuntu.com.dns.podman. 0/1/0 ns: . SOA a.root-servers.net. nstld.verisign-grs.com. 2022101002 1800 900 604800 86400 (122)
10:35:37.467656 eth0  In  IP (tos 0x0, ttl 255, id 0, offset 0, flags [DF], proto UDP (17), length 150)
    10.212.32.2.domain > 10.212.32.183.2794: [udp sum ok] 57180 NXDomain q: AAAA? archive.ubuntu.com.dns.podman. 0/1/0 ns: . SOA a.root-servers.net. nstld.verisign-grs.com. 2022101002 1800 900 604800 86400 (122)
12:49:46.454362 eth0  Out IP (tos 0x0, ttl 62, id 20631, offset 0, flags [DF], proto UDP (17), length 71)
    10.212.32.183.10344 > 10.212.32.2.domain: [bad udp cksum 0x56a5 -> 0xa1d2!] 19343+ A? deb.debian.org.dns.podman. (43)
12:49:46.454412 eth0  Out IP (tos 0x0, ttl 62, id 20632, offset 0, flags [DF], proto UDP (17), length 71)
    10.212.32.183.10344 > 10.212.32.2.domain: [bad udp cksum 0x56a5 -> 0x72c7!] 24474+ AAAA? deb.debian.org.dns.podman. (43)
12:49:46.461578 eth0  In  IP (tos 0x0, ttl 255, id 0, offset 0, flags [DF], proto UDP (17), length 146)
    10.212.32.2.domain > 10.212.32.183.10344: [udp sum ok] 19343 NXDomain q: A? deb.debian.org.dns.podman. 0/1/0 ns: . SOA a.root-servers.net. nstld.verisign-grs.com. 2022101100 1800 900 604800 86400 (118)
12:49:46.465332 eth0  In  IP (tos 0x0, ttl 255, id 0, offset 0, flags [DF], proto UDP (17), length 146)
    10.212.32.2.domain > 10.212.32.183.10344: [udp sum ok] 24474 NXDomain q: AAAA? deb.debian.org.dns.podman. 0/1/0 ns: . SOA a.root-servers.net. nstld.verisign-grs.com. 2022101100 1800 900 604800 86400 (118)
12:49:46.465828 lo    In  IP (tos 0x0, ttl 63, id 11116, offset 0, flags [DF], proto UDP (17), length 74)
    127.0.0.1.starschool > 127.0.0.53.domain: [bad udp cksum 0xfe7d -> 0x0187!] 33163+ A? deb.debian.org.amp.nginx.com. (46)

It seems a bit weird that dns.podman appears in queries.

Host instance is using systemd-resolved with a resolver address provided via DHCP from EC2 VPC. There are no issues with resolving on the host instance.

All our podman workloads are running in rootless mode.

Recently podman was upgraded from 4.1.1 to 4.2.1, netavark - from 1.0.3 to 1.1.0, aardvark-dns - from 1.0.3 to 1.1.0. I'm investigating whether downgrading to previous versions would help.

UPDATE: downgrading podman and aardvark-dns/netavark does not seem to help. Unfortunately, the same versions are not available in Fedora repos, so I checked with these:

aardvark-dns-1.0.2-1.fc36.x86_64
netavark-1.0.2-1.fc36.x86_64
podman-4.0.2-1.fc36.x86_64
podman-gvproxy-4.0.2-1.fc36.x86_64
podman-plugins-4.0.2-1.fc36.x86_64
podman-compose-1.0.3-3.fc36.noarch

The same resolving errors suddenly appear. Investigating further.

DynamoFox commented 2 years ago

Could you guys affected by this issue tell if journalctl -b | grep aard shows something like "unexpected end of input reached" at the same of the resolution failures? Especially when it comes to other containers DNS resolution.

martinpitt commented 2 years ago

Running the reproducer, the first time I get a hang in iteration 5:

+ echo 'resolving test-5 in test-ctrl...'
resolving test-5 in test-ctrl...
+ podman exec -i test-ctrl python3
10.89.0.7
gethostbyname took 14.020793676376343 seconds

No journal messages aside from the initial

systemd[898]: Started run-r8248d1cf5c3841e09b1b1eb348ae9bb2.scope - /usr/libexec/podman/aardvark-dns --config /run/user/1000/containers/networks/aardvark-dns -p 53 run.

Another run was more severe:

+ echo 'resolving test-3 in test-ctrl...'
resolving test-3 in test-ctrl...
+ podman exec -i test-ctrl python3
gethostbyname took 28.11868190765381 seconds
Traceback (most recent call last):
  File "<stdin>", line 4, in <module>
socket.gaierror: [Errno -2] Name or service not known

but still no other journal messages.

In the meantime I upgraded to some newer Fedora 37 versions:

github-actions[bot] commented 1 year ago

A friendly reminder that this issue had no activity for 30 days.

flouthoc commented 1 year ago

@martinpitt Were you able to reproduce this in netavark/aardvark with versions v1.2.0 onwards ?

martinpitt commented 1 year ago

@flouthoc : Yes, on Oct 21 above I already reported that this is still happening with 1.2.0. I also tested it again on current F37:

and it still happens. Does the reproducer not work for you? I have to run it a bunch of times (in the first run I was lucky), but it's been reliable over the last two months -- both on my own machine and a bog standard Fedora 37 cloud image in QEMU. Given that other people also report this, I don't believe this is specific to my setup.

flouthoc commented 1 year ago

@martinpitt No, repro script works fine for me. But I am using netavark and aardvark from main.

[fl@fedora bin]$ ./repro.sh 
+ NETWORK=test
+ IMAGE=registry.fedoraproject.org/fedora
+ podman network rm --force --time 0 test
test
+ podman network create test
test
+ podman run -d --name test-ctrl --network test registry.fedoraproject.org/fedora sleep infinity
Trying to pull registry.fedoraproject.org/fedora:latest...
Getting image source signatures
Copying blob 49822917e40a done  
Copying config 885d2b38b8 done  
Writing manifest to image destination
Storing signatures
fc39a271c10160528b26e73fc350b3f8c5e5ca78fe6096737f6816e339f023b5
++ seq 5
+ for i in $(seq 5)
+ podman run -d --name test-1 --network test registry.fedoraproject.org/fedora sleep infinity
77c77ca47b1ff0b4300bfd13ae279104c69d8089446b493ed9fc414a6138e0bb
+ sleep 1
+ echo 'resolving test-1 in test-ctrl...'
resolving test-1 in test-ctrl...
+ podman exec -i test-ctrl python3
10.89.0.4
gethostbyname took 0.007965564727783203 seconds
+ for i in $(seq 5)
+ podman run -d --name test-2 --network test registry.fedoraproject.org/fedora sleep infinity
10cd44d69fbaccae41406095f222359783979775e2cce66472314fc61c50f79d
+ sleep 1
+ echo 'resolving test-2 in test-ctrl...'
resolving test-2 in test-ctrl...
+ podman exec -i test-ctrl python3
10.89.0.5
gethostbyname took 0.003464937210083008 seconds
+ for i in $(seq 5)
+ podman run -d --name test-3 --network test registry.fedoraproject.org/fedora sleep infinity
81d51ea5f5311d66beffe09e26ee2ea2f05eea33d8e570bef2daf69a8b54fc69
+ sleep 1
+ echo 'resolving test-3 in test-ctrl...'
resolving test-3 in test-ctrl...
+ podman exec -i test-ctrl python3
10.89.0.6
gethostbyname took 0.004729032516479492 seconds
+ for i in $(seq 5)
+ podman run -d --name test-4 --network test registry.fedoraproject.org/fedora sleep infinity
3e9743cff62f6ab5a1623d5e95d516a9be0d57b828e2b0055ee0bd06fbea3240
+ sleep 1
+ echo 'resolving test-4 in test-ctrl...'
resolving test-4 in test-ctrl...
+ podman exec -i test-ctrl python3
10.89.0.7
gethostbyname took 0.003484964370727539 seconds
+ for i in $(seq 5)
+ podman run -d --name test-5 --network test registry.fedoraproject.org/fedora sleep infinity
d826b28f08cc43e97c6fdeecd78a5e5c3c1515320ebb062df9cd968325d75211
+ sleep 1
+ echo 'resolving test-5 in test-ctrl...'
resolving test-5 in test-ctrl...
+ podman exec -i test-ctrl python3
10.89.0.8
gethostbyname took 0.003689289093017578 seconds
flouthoc commented 1 year ago

I'll check again with exact versions and on some other platforms as well.

thrix commented 1 year ago

We were hit by this quite hard in Fedora CI.

The funny thing is we can reproduce the problem even without aadvark-dns, but it is a lot harder to hit.

Reproducer we used:

podman run --privileged --init --rm --entrypoint bash quay.io/fedora/fedora:latest -c "dnf -y install bind-utils; for i in {1..10000}; do host www.google.com || break; sleep 0.1; done"
Luap99 commented 1 year ago

Are you all running rootless or rootful?

github-actions[bot] commented 1 year ago

A friendly reminder that this issue had no activity for 30 days.

martinpitt commented 1 year ago

rootless for me, as the issue description says.

github-actions[bot] commented 1 year ago

A friendly reminder that this issue had no activity for 30 days.

martinpitt commented 1 year ago

Still confirmed on current Fedora 37 with aardvark-dns-1.4.0-1.fc37.x86_64, netavark-1.4.0-1.fc37.x86_64, and podman-4.3.1-1.fc37.x86_64

rhatdan commented 1 year ago

Could you check this against podman 4.4 which was just released with new aardvark-dns and netavark.

martinpitt commented 1 year ago

Tested against current F37+proposed-updates with podman-4.4.1-1.fc37.x86_64, aardvark-dns-1.5.0-4.fc37.x86_64, netavark-1.5.0-4.fc37.x86_64. This still happens.

+ podman exec -i test-ctrl python3
gethostbyname took 40.041966915130615 seconds
Traceback (most recent call last):
  File "<stdin>", line 4, in <module>
socket.gaierror: [Errno -3] Temporary failure in name resolution

+ podman exec -i test-ctrl python3
10.89.0.5
gethostbyname took 10.015900611877441 seconds

In fact, it feels like it's even easier to reproduce than before. Out of the 5 iterations, only the first one succeeded fast (0.008 s), 1 took very long (10 s), and three failed completely (socket.gaierror)

Luap99 commented 1 year ago

@martinpitt I cannot reproduce this with your provided script.

Do you have other containers running while testing this? Did you try this on a clean fedora install?

Can you share podman unshare --rootless-netns cat /etc/resolv.conf?

martinpitt commented 1 year ago

Do you have other containers running while testing this?

Yes, two toolboxes, if I run the reproducer on my laptop. But not when I ran it on a standard F37 cloud image above.

Did you try this on a clean fedora install?

Yes, see above. Reproducer from scratch:

curl -L -o fedora.img https://download.fedoraproject.org/pub/fedora/linux/releases/37/Cloud/x86_64/images/Fedora-Cloud-Base-37-1.7.x86_64.qcow2
# nothing fancy, just admin:foobar and root:foobar
curl -L -O https://github.com/cockpit-project/bots/raw/main/machine/cloud-init.iso
qemu-system-x86_64 -cpu host -enable-kvm -nographic -m 2048 -drive file=fedora.img,if=virtio -snapshot -cdrom cloud-init.iso -net nic,model=virtio -net user,hostfwd=tcp::2201-:22

Then you can login as "admin:foobar" on the VT, or SSH in on localhost port 2201 for a more comfortable shell. sudo password is also "foobar". Update the image (which is already quite old) and install latest podman:

sudo dnf  update -y
sudo dnf install -y --enablerepo=updates-testing podman
sudo reboot

Then

curl -L -O https://github.com/containers/podman/files/9664719/repr.sh.txt
sh repr.sh.txt

As I said, you may have to run this several times -- for me, in that fresh and isolated environments it only failed on the 4th attempt. This is likely dependent on background noise and/or timing. You can of course increase the seq in the script.

podman unshare --rootless-netns cat /etc/resolv.conf

On my laptop:

search fritz.box redhat.com
nameserver 10.0.2.3
nameserver 192.168.178.1
nameserver fd00::7642:7fff:fe7d:4e93
nameserver 2003:c7:8f3c:d200:7642:7fff:fe7d:4e93

In the F37 VM above:

nameserver 10.0.2.3
nameserver 10.0.2.3

(i.e. SLIRP only)

Luap99 commented 1 year ago

My only guess would be a problem with ipv6. Maybe try to disable it temporarily and test again.

martinpitt commented 1 year ago

The F37 VM does not have any public IPv6 addresses, only the link-local ones:

 ip -c a show eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 52:54:00:12:34:56 brd ff:ff:ff:ff:ff:ff
    altname enp0s3
    altname ens3
    inet 10.0.2.15/24 brd 10.0.2.255 scope global dynamic noprefixroute eth0
       valid_lft 86341sec preferred_lft 86341sec
    inet6 fec0::fd9a:cd18:64dd:6c40/64 scope site dynamic noprefixroute 
       valid_lft 86343sec preferred_lft 14343sec
    inet6 fe80::8277:600e:eb6d:3172/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever

[root@localhost ~]# ip route
default via 10.0.2.2 dev eth0 proto dhcp src 10.0.2.15 metric 100 
10.0.2.0/24 dev eth0 proto kernel scope link src 10.0.2.15 metric 100 

and no IPv6 nameservers either (see comment above)

Luap99 commented 6 months ago

@martinpitt Do you still see this problem?

martinpitt commented 6 months ago

Indeed I don't. I ran the reproducer script 50 times. Cheers!