containers / aardvark-dns

Authoritative dns server for A/AAAA container records. Forwards other request to host's /etc/resolv.conf
Apache License 2.0
176 stars 31 forks source link

Publishing udp range larger than 16383 ending with 65535 breaks dns resolution on user defined networks with root networking. #473

Open TheSiman opened 3 months ago

TheSiman commented 3 months ago

Issue Description

When publishing a udp range larger than 16383 ports ending with 65535, dns resolution on user defined networks stops working. I could only reproduce it with larger ranges ending with 65535. (the intended use-case was coturn)

  1. 49152-65535:49152-65535/udp (doesn't work)
  2. 49152-65535:49152-65535/tcp (tcp - works)
  3. 49153-65535:49153-65535/udp (1 port smaller - works)
  4. 45000-65534:45000-65534/udp (much larger range, but ends 1 port before 65535 - works)

Steps to reproduce the issue

Steps to reproduce the issue

  1. podman network create debug
  2. podman run --rm -it --publish 45000-65534:45000-65534/udp docker.io/debian
  3. podman run --rm -it --network=debug docker.io/debian apt update

Describe the results you received

DNS requests inside container time out.

Describe the results you expected

DNS requests inside container get a response.

podman info output

host:
  arch: amd64
  buildahVersion: 1.35.4
  cgroupControllers:
  - cpuset
  - cpu
  - io
  - memory
  - hugetlb
  - pids
  - rdma
  - misc
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.1.10-1.fc40.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.10, commit: '
  cpuUtilization:
    idlePercent: 96.41
    systemPercent: 2.57
    userPercent: 1.03
  cpus: 4
  databaseBackend: sqlite
  distribution:
    distribution: fedora
    variant: coreos
    version: "40"
  eventLogger: journald
  freeLocks: 2047
  hostname: uncontrol-old
  idMappings:
    gidmap: null
    uidmap: null
  kernel: 6.8.9-300.fc40.x86_64
  linkmode: dynamic
  logDriver: journald
  memFree: 4969246720
  memTotal: 8319741952
  networkBackend: netavark
  networkBackendInfo:
    backend: netavark
    dns:
      package: aardvark-dns-1.10.0-1.fc40.x86_64
      path: /usr/libexec/podman/aardvark-dns
      version: aardvark-dns 1.10.0
    package: netavark-1.10.3-3.fc40.x86_64
    path: /usr/libexec/podman/netavark
    version: netavark 1.10.3
  ociRuntime:
    name: crun
    package: crun-1.15-1.fc40.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 1.15
      commit: e6eacaf4034e84185fd8780ac9262bbf57082278
      rundir: /run/user/0/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +LIBKRUN +WASM:wasmedge +YAJL
  os: linux
  pasta:
    executable: /usr/bin/pasta
    package: passt-0^20240510.g7288448-1.fc40.x86_64
    version: |
      pasta 0^20240510.g7288448-1.fc40.x86_64
      Copyright Red Hat
      GNU General Public License, version 2 or later
        <https://www.gnu.org/licenses/old-licenses/gpl-2.0.html>
      This is free software: you are free to change and redistribute it.
      There is NO WARRANTY, to the extent permitted by law.
  remoteSocket:
    exists: false
    path: /run/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: false
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.2.2-2.fc40.x86_64
    version: |-
      slirp4netns version 1.2.2
      commit: 0ee2d87523e906518d34a6b423271e4826f71faf
      libslirp: 4.7.0
      SLIRP_CONFIG_VERSION_MAX: 4
      libseccomp: 2.5.3
  swapFree: 0
  swapTotal: 0
  uptime: 0h 30m 58.00s
  variant: ""
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  search:
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - docker.io
  - quay.io
store:
  configFile: /usr/share/containers/storage.conf
  containerStore:
    number: 1
    paused: 0
    running: 1
    stopped: 0
  graphDriverName: overlay
  graphOptions:
    overlay.imagestore: /usr/lib/containers/storage
    overlay.mountopt: nodev,metacopy=on
  graphRoot: /var/lib/containers/storage
  graphRootAllocated: 171260751872

Podman in a container

No

Privileged Or Rootless

Privileged

Upstream Latest Release

No

Additional environment details

QEMU/KVM (netcup.eu)

Additional information

No response

Luap99 commented 3 months ago

Because you exhaust the ephemeral port range on the host with the ports. As such the aardvark-dns can no longer make dns requests on its own. At least that is the error condition.

It is not clear from the strace but it looks like aardvark-dns requests a random port on bind not 0 where the kernel should assign a random free one bind(10, {sa_family=AF_INET, sin_port=htons(53788), sin_addr=inet_addr("0.0.0.0")}, 16) = -1 EADDRINUSE (Address already in use)

Thus it is likely a bug in aardvark-dns as it should just bind port 0 because if we look in the kernel there is still plenty of space.

$ cat /proc/sys/net/ipv4/ip_local_port_range 32768 60999

TheSiman commented 3 months ago

I'm sorry, I don't completely follow. Why would freeing up port 65535 fix it if this was the case, since it's well outside of the ip_local_port_range and the chance of it being picked randomly seems extremely low?

Luap99 commented 3 months ago

I agree that is seems odd to we hit 65535 randomly but I looked at the code and this is what our lib is doing, it also does not respect ip_local_port_range at all.

https://github.com/hickory-dns/hickory-dns/blob/f1489da675c21fddc189f2c9505bc9da6c156835/crates/proto/src/udp/udp_stream.rs#L291-L339

No idea why they did this instead of just binding to 0 and let the kernel pick the port.

Luap99 commented 3 months ago

Note that I linked the new range there that is not in a new release yet so we do not make use of it, https://github.com/hickory-dns/hickory-dns/commit/8f05d14eed53c488ee14fa860ef68cad352168ff

And if I look at the strace output we have thousands of bind retires so It makes sense that it hit the one free one eventually.

Luap99 commented 2 months ago

I created https://github.com/hickory-dns/hickory-dns/pull/2260 over there