Rootfull containers on same network have inconsistent DNS name resolution

amd-isaac commented 1 year ago

Issue Description

Running with RHEL 8.8 and Podman 4.4.1 using CNI network backend with dnsname installed. When containers that share the same network attempt to communicate with each other by name, the results are inconsistent.

Steps to reproduce the issue

Steps to reproduce the issue:

podman network create testnet
podman run -dt --net testnet --name receiver docker.io/library/httpd
Run podman run -it --rm --net testnet --name sender ubi8/ubi curl receiver:80 20 times

Describe the results you received

<html><body><h1>It works!</h1></body></html>
<html><body><h1>It works!</h1></body></html>
curl: (6) Could not resolve host: receiver
curl: (6) Could not resolve host: receiver
curl: (6) Could not resolve host: receiver
<html><body><h1>It works!</h1></body></html>
<html><body><h1>It works!</h1></body></html>
curl: (6) Could not resolve host: receiver
<html><body><h1>It works!</h1></body></html>
<html><body><h1>It works!</h1></body></html>
curl: (6) Could not resolve host: receiver
<html><body><h1>It works!</h1></body></html>
curl: (6) Could not resolve host: receiver
<html><body><h1>It works!</h1></body></html>
<html><body><h1>It works!</h1></body></html>
<html><body><h1>It works!</h1></body></html>
<html><body><h1>It works!</h1></body></html>
<html><body><h1>It works!</h1></body></html>
<html><body><h1>It works!</h1></body></html>
curl: (6) Could not resolve host: receiver

Describe the results you expected

Expected all attempts by sender to communicate with receiver to succeed.

podman info output

host:
  arch: amd64
  buildahVersion: 1.29.0
  cgroupControllers:
  - cpuset
  - cpu
  - cpuacct
  - blkio
  - memory
  - devices
  - freezer
  - net_cls
  - perf_event
  - net_prio
  - hugetlb
  - pids
  - rdma
  cgroupManager: systemd
  cgroupVersion: v1
  conmon:
    package: conmon-2.1.6-1.module+el8.8.0+18098+9b44df5f.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.6, commit: 8c4ab5a095127ecc96ef8a9c885e0e1b14aeb11b'
  cpuUtilization:
    idlePercent: 99.03
    systemPercent: 0.13
    userPercent: 0.84
  cpus: 4
  distribution:
    distribution: '"rhel"'
    version: "8.8"
  eventLogger: file
  hostname: myserver
  idMappings:
    gidmap: null
    uidmap: null
  kernel: 4.18.0-477.10.1.el8_8.x86_64
  linkmode: dynamic
  logDriver: k8s-file
  memFree: 353624064
  memTotal: 16526708736
  networkBackend: cni
  ociRuntime:
    name: runc
    package: runc-1.1.4-1.module+el8.8.0+18060+3f21f2cc.x86_64
    path: /usr/bin/runc
    version: |-
      runc version 1.1.4
      spec: 1.0.2-dev
      go: go1.19.4
      libseccomp: 2.5.2
  os: linux
  remoteSocket:
    path: /run/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_SYS_CHROOT,CAP_NET_RAW,CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID
    rootless: false
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: false
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.2.0-2.module+el8.8.0+18060+3f21f2cc.x86_64
    version: |-
      slirp4netns version 1.2.0
      commit: 656041d45cfca7a4176f6b7eed9e4fe6c11e8383
      libslirp: 4.4.0
      SLIRP_CONFIG_VERSION_MAX: 3
      libseccomp: 2.5.2
  swapFree: 33541058560
  swapTotal: 33596370944
  uptime: 1858h 17m 31.00s (Approximately 77.42 days)
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  search:
  - registry.access.redhat.com
  - registry.redhat.io
  - docker.io
store:
  configFile: /etc/containers/storage.conf
  containerStore:
    number: 11
    paused: 0
    running: 11
    stopped: 0
  graphDriverName: overlay
  graphOptions:
    overlay.mountopt: nodev,metacopy=on
  graphRoot: /var/lib/containers/storage
  graphRootAllocated: 300631982080
  graphRootUsed: 11916087296
  graphStatus:
    Backing Filesystem: xfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "true"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 176
  runRoot: /run/containers/storage
  transientStore: false
  volumePath: /var/lib/containers/storage/volumes
version:
  APIVersion: 4.4.1
  Built: 1686839996
  BuiltTime: Thu Jun 15 10:39:56 2023
  GitCommit: ""
  GoVersion: go1.19.6
  Os: linux
  OsArch: linux/amd64
  Version: 4.4.1

Podman in a container

No

Privileged Or Rootless

None

Upstream Latest Release

Yes

Additional environment details

None

Additional information

Exact order and number of successes/failures seems random, rerunning 20 more times will not show the same pattern of successes/failures. Behavior also seen on a separate RHEL 8.7 system also running Podman 4.4.1 with same network backend configuration.

flouthoc commented 1 year ago

Could you please paste the version of cni plugins being used as well ? I am not sure if cni is recommened for this version of aardvark/netavark . @Luap99 Could confirm this better.

Luap99 commented 1 year ago

Yes as far as upstream is concerned CNI support is deprecated and dnsname is basically EOL.

Please test with netavark + aardvark-dns instead. Although I assume this is a simple race condition if you add a sleep 1 before the curl does it work better?

amd-isaac commented 1 year ago

@flouthoc here are the versions:

# /usr/libexec/cni/bridge --help
CNI bridge plugin version unknown
CNI protocol versions supported: 0.1.0, 0.2.0, 0.3.0, 0.3.1, 0.4.0, 1.0.0
# /usr/libexec/cni/dnsname --help
CNI dnsname plugin
version: 1.4.0-dev
commit: 6685f68dbc13a95b73b9394b304927c6f518021c
CNI protocol versions supported: 0.1.0, 0.2.0, 0.3.0, 0.3.1, 0.4.0, 1.0.0

@Luap99 adding the sleep 1 before the curl does not fix the issue. Running the command manually within the container multiple times without restarting the container shows the same random failure. I also tried switching to the netavark backend and still see the same failure (running with netavark 1.5.1 and aardvark-dns 1.5.0).

baude commented 1 year ago

should this be taken to bugzilla?

Luap99 commented 1 year ago

I think getting this into the RHEL channels would help getting prioritization.

In any case if this happens for both CNI and netavark then I would not think it is a problem in the dns server itself. Did you try to capture the packages and see where they get lost?

amd-isaac commented 12 months ago

@Luap99 here's a tcpdump of a successful (first trace) and unsuccessful (second trace) from the perspective of the sender.

Successful connection:

20:52:04.835352 IP c3fbc69897c9.51345 > host.containers.internal.domain: 49549+ A? receiver.dns.podman. (37)
20:52:04.835385 IP c3fbc69897c9.51345 > host.containers.internal.domain: 43393+ AAAA? receiver.dns.podman. (37)
20:52:04.835488 IP host.containers.internal.domain > c3fbc69897c9.51345: 49549 1/0/0 A 10.89.0.46 (53)
20:52:04.835506 IP host.containers.internal.domain > c3fbc69897c9.51345: 43393 0/0/0 (37)
20:52:04.835665 IP c3fbc69897c9.35880 > 10.89.0.46.http: Flags [S], seq 3431928952, win 29200, options [mss 1460,sackOK,TS val 1511773294 ecr 0,nop,wscale 8], length 0
20:52:04.835688 IP 10.89.0.46.http > c3fbc69897c9.35880: Flags [S.], seq 1530778552, ack 3431928953, win 28960, options [mss 1460,sackOK,TS val 2943561629 ecr 1511773294,nop,wscale 8], length 0
20:52:04.835694 IP c3fbc69897c9.35880 > 10.89.0.46.http: Flags [.], ack 1, win 115, options [nop,nop,TS val 1511773294 ecr 2943561629], length 0
20:52:04.835715 IP c3fbc69897c9.35880 > 10.89.0.46.http: Flags [P.], seq 1:73, ack 1, win 115, options [nop,nop,TS val 1511773294 ecr 2943561629], length 72: HTTP: GET / HTTP/1.1
20:52:04.835735 IP 10.89.0.46.http > c3fbc69897c9.35880: Flags [.], ack 73, win 114, options [nop,nop,TS val 2943561629 ecr 1511773294], length 0
20:52:04.835918 IP 10.89.0.46.http > c3fbc69897c9.35880: Flags [P.], seq 1:271, ack 73, win 114, options [nop,nop,TS val 2943561630 ecr 1511773294], length 270: HTTP: HTTP/1.1 200 OK
20:52:04.835927 IP c3fbc69897c9.35880 > 10.89.0.46.http: Flags [.], ack 271, win 119, options [nop,nop,TS val 1511773295 ecr 2943561630], length 0
20:52:04.835984 IP c3fbc69897c9.35880 > 10.89.0.46.http: Flags [F.], seq 73, ack 271, win 119, options [nop,nop,TS val 1511773295 ecr 2943561630], length 0
20:52:04.836565 IP 10.89.0.46.http > c3fbc69897c9.35880: Flags [F.], seq 271, ack 74, win 114, options [nop,nop,TS val 2943561630 ecr 1511773295], length 0
20:52:04.836570 IP c3fbc69897c9.35880 > 10.89.0.46.http: Flags [.], ack 272, win 119, options [nop,nop,TS val 1511773295 ecr 2943561630], length 0

Could not resolve host:

20:52:09.140300 IP c3fbc69897c9.54750 > atldns02.amd.com.domain: 64723+ A? receiver.dns.podman. (37)
20:52:09.140349 IP c3fbc69897c9.54750 > atldns02.amd.com.domain: 29399+ AAAA? receiver.dns.podman. (37)
20:52:09.140802 IP atldns02.amd.com.domain > c3fbc69897c9.54750: 29399 NXDomain 0/1/0 (112)
20:52:09.140805 IP atldns02.amd.com.domain > c3fbc69897c9.54750: 64723 NXDomain 0/1/0 (112)
20:52:09.140858 IP c3fbc69897c9.52989 > host.containers.internal.domain: 64669+ A? receiver.amd.com. (34)
20:52:09.140883 IP c3fbc69897c9.52989 > host.containers.internal.domain: 52378+ AAAA? receiver.amd.com. (34)
20:52:09.141498 IP host.containers.internal.domain > c3fbc69897c9.52989: 64669 NXDomain* 0/1/0 (101)
20:52:09.141509 IP host.containers.internal.domain > c3fbc69897c9.52989: 52378 NXDomain* 0/1/0 (101)
20:52:09.141537 IP c3fbc69897c9.47764 > atldns01.amd.com.domain: 40416+ A? receiver. (26)
20:52:09.141556 IP c3fbc69897c9.47764 > atldns01.amd.com.domain: 37857+ AAAA? receiver. (26)
20:52:09.141910 IP atldns01.amd.com.domain > c3fbc69897c9.47764: 40416 NXDomain 0/1/0 (101)
20:52:09.141913 IP atldns01.amd.com.domain > c3fbc69897c9.47764: 37857 NXDomain 0/1/0 (101)
20:52:10.287892 ARP, Request who-has 10.89.0.46 tell c3fbc69897c9, length 28
20:52:10.287892 ARP, Request who-has host.containers.internal tell c3fbc69897c9, length 28
20:52:10.287885 ARP, Request who-has c3fbc69897c9 tell host.containers.internal, length 28
20:52:10.287908 ARP, Request who-has c3fbc69897c9 tell 10.89.0.46, length 28
20:52:10.287911 ARP, Reply c3fbc69897c9 is-at ae:b9:59:5b:e1:8e (oui Unknown), length 28
20:52:10.287912 ARP, Reply c3fbc69897c9 is-at ae:b9:59:5b:e1:8e (oui Unknown), length 28
20:52:10.287915 ARP, Reply host.containers.internal is-at 96:b9:d0:15:d4:97 (oui Unknown), length 28
20:52:10.287917 ARP, Reply 10.89.0.46 is-at f6:5c:9a:66:f2:56 (oui Unknown), length 28

amd-isaac commented 10 months ago

After debugging this issue further, it appears to be similar to #19515. It involves the host's /etc/resolv.conf settings being imported into the container's /etc/resolv.conf file; the rotate option in conjunction with the additional nameservers leading to inconsistent DNS resolution failures for inter-container communication. I am currently using a workaround involving the container editing its own resolv.conf file's contents with sed to remove the offending rotate option.

I have been unable to find a podman-specific solution to this problem with version 4.4.1; does this version have a way to prevent podman from importing the host's resolv.conf settings?

rhatdan commented 10 months ago

If you volume mount in your own /etc/resolv.conf?

Luap99 commented 10 months ago

We allow --dns-search=. to unset the search domains but --dns-option=. does not work currently so I feel like this is something we can do to allow removing options.

However what you describe should only apply to CNI and not netavark. With netavark it should only add the aardvark-dns ips in resolv.conf so even with the rotate options in resolv.conf it should just work.

amd-isaac commented 10 months ago

After further testing, it appears that with podman version 4.4.1, both the netavark and CNI backends have this issue. However, with version 4.6.1 the netavark backend does not have this issue (the CNI backend still does).

Presumably something changed between 4.4.1 and 4.6.1 that fixed this issue when running with netavark, as @Luap99 indicated. I would be interested in testing a --dns-option=. flag in the future, but for now I've been able to work around the issue.

containers / podman