containers / podman

Podman: A tool for managing OCI containers and pods.
https://podman.io
Apache License 2.0
23.15k stars 2.36k forks source link

Pasta: DNS (or just UDP) issues #21947

Closed hakong closed 6 months ago

hakong commented 6 months ago

Issue Description

DNS resolution fails intermittently when using pasta. This could be just DNS or UDP in general, I have not thoroughly tested other UDP services. Failure rate increases either with time or amount of network traffic.

Steps to reproduce the issue

Steps to reproduce the issue

  1. Start two identical containers (rootless in my case), only one with --network=pasta
  2. Do lots of DNS requests from both containers
  3. DNS requests from container using pasta should start failing intermittently
podman run -d -v ~/data/:/data/:z --name=ubi9 ubi9 /data/test.sh
podman run -d -v ~/data-pasta/:/data/:z --name=ubi9-pasta --network=pasta ubi9 /data/test.sh

data/test.sh and data-pasta/test.sh:

#!/bin/bash

# Install necessary utilities
dnf install bind-utils -y

# Determine script's directory to output the results there
SCRIPT_DIR="$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )"

# Output file path
OUTPUT_FILE="$SCRIPT_DIR/network_results.csv"

echo "date,type,target,protocol,result" > "$OUTPUT_FILE"

# Run indefinitely
while true; do
    # Get current ISO 8601 date
    current_date=$(date --iso-8601=seconds)

    # DNS query to google.com using 8.8.8.8 over UDP
    if dig @8.8.8.8 google.com +noall +stats &> /dev/null; then
        echo "$current_date,DNS,google.com@8.8.8.8,UDP,Success" >> "$OUTPUT_FILE"
    else
        echo "$current_date,DNS,google.com@8.8.8.8,UDP,Failure" >> "$OUTPUT_FILE"
    fi

    # Wait a bit before the next operation
    sleep 1

    # DNS query to google.com using 8.8.8.8 over TCP
    if dig @8.8.8.8 google.com +tcp +noall +stats &> /dev/null; then
        echo "$current_date,DNS,google.com@8.8.8.8,TCP,Success" >> "$OUTPUT_FILE"
    else
        echo "$current_date,DNS,google.com@8.8.8.8,TCP,Failure" >> "$OUTPUT_FILE"
    fi

    # Wait a bit before the next operation
    sleep 1

    # DNS query to cloudflare.com using 1.1.1.1 over UDP
    current_date=$(date --iso-8601=seconds)
    if dig @1.1.1.1 cloudflare.com +noall +stats &> /dev/null; then
        echo "$current_date,DNS,cloudflare.com@1.1.1.1,UDP,Success" >> "$OUTPUT_FILE"
    else
        echo "$current_date,DNS,cloudflare.com@1.1.1.1,UDP,Failure" >> "$OUTPUT_FILE"
    fi

    # Wait a bit before the next operation
    sleep 1

    # DNS query to cloudflare.com using 1.1.1.1 over TCP
    current_date=$(date --iso-8601=seconds)
    if dig @1.1.1.1 cloudflare.com +tcp +noall +stats &> /dev/null; then
        echo "$current_date,DNS,cloudflare.com@1.1.1.1,TCP,Success" >> "$OUTPUT_FILE"
    else
        echo "$current_date,DNS,cloudflare.com@1.1.1.1,TCP,Failure" >> "$OUTPUT_FILE"
    fi

    # Wait a bit before the next operation
    sleep 1
done

Alternative method of testing:

[playground@container-2 ~]$ podman run -d --name=ubi9 ubi9 /data/test.sh
[playground@container-2 ~]$ podman run -d --name=ubi9-pasta --network=pasta ubi9 /data/test.sh

[playground@container-2 ~]$ podman ps
CONTAINER ID  IMAGE                                   COMMAND        CREATED      STATUS      PORTS       NAMES
c9ead8666be7  registry.access.redhat.com/ubi9:latest  /data/test.sh  3 hours ago  Up 3 hours              ubi9
ceed5aa5e253  registry.access.redhat.com/ubi9:latest  /data/test.sh  3 hours ago  Up 3 hours              ubi9-pasta

[playground@container-2 ~]$ podman inspect ubi9 | grep -i networkmode
               "NetworkMode": "slirp4netns",

[playground@container-2 ~]$ podman inspect ubi9-pasta | grep -i networkmode
               "NetworkMode": "pasta",

[playground@container-2 ~]$ podman exec -it ubi9 bash

[root@c9ead8666be7 /]# { success=0; failure=0; for i in $(seq 1 100); do if dig @8.8.8.8 google.com +time=3 +tries=1 +noall +answer | grep -q 'IN\sA'; then ((success++)); else ((failure++)); fi; done; echo "Success: $success, Failure: $failure"; }
Success: 100, Failure: 0
[root@c9ead8666be7 /]#
exit
[playground@container-2 ~]$ podman exec -it ubi9-pasta bash
[root@ceed5aa5e253 /]# { success=0; failure=0; for i in $(seq 1 100); do if dig @8.8.8.8 google.com +time=3 +tries=1 +noall +answer | grep -q 'IN\sA'; then ((success++)); else ((failure++)); fi; done; echo "Success: $success, Failure: $failure"; }
Success: 96, Failure: 4
[root@ceed5aa5e253 /]#
exit

Note in this test only 4% of requests failed but a container running for 20h with some amount of network traffic the failure rate increases to 50-70%.

Describe the results you received

Describe the results you received

[playground@container-2 ~]$ head -n 5000 data-pasta/network_results.csv | grep Success -c
4773
[playground@container-2 ~]$ head -n 5000 data-pasta/network_results.csv | grep Failure -c
104

Side by side tmux screenshot: image

Describe the results you expected

Describe the results you expected

[playground@container-2 ~]$ head -n 5000 data/network_results.csv | grep Success -c
4999
[playground@container-2 ~]$ head -n 5000 data/network_results.csv | grep Failure -c
0

podman info output

[playground@container-2 ~]$ podman info
host:
  arch: amd64
  buildahVersion: 1.31.3
  cgroupControllers:
  - memory
  - pids
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.1.8-1.el9.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.8, commit: aadb7c890ac6283eb4666d92690238e5fbdec5c7'
  cpuUtilization:
    idlePercent: 94.1
    systemPercent: 2.01
    userPercent: 3.89
  cpus: 4
  databaseBackend: boltdb
  distribution:
    distribution: '"rhel"'
    version: "9.3"
  eventLogger: file
  freeLocks: 2021
  hostname: container-2.redacted
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1003
      size: 1
    - container_id: 1
      host_id: 296608
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1003
      size: 1
    - container_id: 1
      host_id: 296608
      size: 65536
  kernel: 5.14.0-362.18.1.el9_3.x86_64
  linkmode: dynamic
  logDriver: k8s-file
  memFree: 738545664
  memTotal: 3947732992
  networkBackend: netavark
  networkBackendInfo:
    backend: netavark
    dns:
      package: aardvark-dns-1.7.0-1.el9.x86_64
      path: /usr/libexec/podman/aardvark-dns
      version: aardvark-dns 1.7.0
    package: netavark-1.7.0-2.el9_3.x86_64
    path: /usr/libexec/podman/netavark
    version: netavark 1.7.0
  ociRuntime:
    name: crun
    package: crun-1.8.7-1.el9.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 1.8.7
      commit: 53a9996ce82d1ee818349bdcc64797a1fa0433c4
      rundir: /run/user/1003/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL
  os: linux
  pasta:
    executable: /usr/bin/pasta
    package: passt-0^20230818.g0af928e-4.el9.x86_64
    version: |
      pasta 0^20230818.g0af928e-4.el9.x86_64
      Copyright Red Hat
      GNU Affero GPL version 3 or later <https://www.gnu.org/licenses/agpl-3.0.html>
      This is free software: you are free to change and redistribute it.
      There is NO WARRANTY, to the extent permitted by law.
  remoteSocket:
    path: /run/user/1003/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.2.1-1.el9.x86_64
    version: |-
      slirp4netns version 1.2.1
      commit: 09e31e92fa3d2a1d3ca261adaeb012c8d75a8194
      libslirp: 4.4.0
      SLIRP_CONFIG_VERSION_MAX: 3
      libseccomp: 2.5.2
  swapFree: 0
  swapTotal: 0
  uptime: 96h 11m 8.00s (Approximately 4.00 days)
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  search:
  - registry.access.redhat.com
  - registry.redhat.io
  - docker.io
store:
  configFile: /home/playground/.config/containers/storage.conf
  containerStore:
    number: 27
    paused: 0
    running: 2
    stopped: 25
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /home/playground/.local/share/containers/storage
  graphRootAllocated: 26096939008
  graphRootUsed: 6625734656
  graphStatus:
    Backing Filesystem: xfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 2
  runRoot: /run/user/1003/containers
  transientStore: false
  volumePath: /home/playground/.local/share/containers/storage/volumes
version:
  APIVersion: 4.6.1
  Built: 1701529524
  BuiltTime: Sat Dec  2 15:05:24 2023
  GitCommit: ""
  GoVersion: go1.20.10
  Os: linux
  OsArch: linux/amd64
  Version: 4.6.1

Podman in a container

No

Privileged Or Rootless

Rootless

Upstream Latest Release

No

Additional environment details

[root@container-2 ~]# cat /etc/redhat-release
Red Hat Enterprise Linux release 9.3 (Plow)
[root@container-2 ~]# uname -a
Linux container-2.redacted 5.14.0-362.18.1.el9_3.x86_64 #1 SMP PREEMPT_DYNAMIC Wed Jan 3 15:54:45 EST 2024 x86_64 x86_64 x86_64 GNU/Linux
[root@container-2 ~]#
[root@container-2 ~]# dnf info passt
Failed to set locale, defaulting to C.UTF-8
Updating Subscription Management repositories.
Last metadata expiration check: 0:02:35 ago on Tue Mar  5 12:51:41 2024.
Installed Packages
Name         : passt
Version      : 0^20230818.g0af928e
Release      : 4.el9
Architecture : x86_64
Size         : 748 k
Source       : passt-0^20230818.g0af928e-4.el9.src.rpm
Repository   : @System
From repo    : rhel-9-for-x86_64-appstream-rpms
Summary      : User-mode networking daemons for virtual machines and namespaces
URL          : https://passt.top/
License      : GPLv2+ and BSD
Description  : passt implements a translation layer between a Layer-2 network interface and
             : native Layer-4 sockets (TCP, UDP, ICMP/ICMPv6 echo) on a host. It doesn't
             : require any capabilities or privileges, and it can be used as a simple
             : replacement for Slirp.
             :
             : pasta (same binary as passt, different command) offers equivalent functionality,
             : for network namespaces: traffic is forwarded using a tap interface inside the
             : namespace, without the need to create further interfaces on the host, hence not
             : requiring any capabilities or privileges.
[playground@container-2 ~]$ podman version
Client:       Podman Engine
Version:      4.6.1
API Version:  4.6.1
Go Version:   go1.20.10
Built:        Sat Dec  2 15:05:24 2023
OS/Arch:      linux/amd64
[playground@container-2 ~]$

Additional information

Issue replicated with RHEL ubi containers and Debian-based containers as well.

Example from docker.io/louislam/uptime-kuma: image

[uptime-kuma@container-2 ~]$ podman ps
CONTAINER ID  IMAGE                                  COMMAND               CREATED       STATUS       PORTS                   NAMES
741613ac6b5d  docker.io/louislam/uptime-kuma:latest  node server/serve...  18 hours ago  Up 18 hours  0.0.0.0:3001->3001/tcp  uptime-kuma
e9b70914e961  docker.io/louislam/uptime-kuma:latest  node server/serve...  18 hours ago  Up 18 hours  0.0.0.0:3002->3001/tcp  uptime-kuma-pasta

18 hours of network traffic and everything that uses DNS lookups is failing. All 'green' services are using an IP address and not a hostname. Combination of SSH connectivity tests, TCP syn/ack tests, and HTTPS GET requests.

Immediately after restarting the pasta container, all service checks succeed and return to green status: image

podman-inspect-ubi9.txt podman-inspect-ubi9-pasta.txt

Luap99 commented 6 months ago

AFAIK @dgibson is working on some udp fixes in pasta right now. cc @sbrivio-rh

0^20230818.g0af928e

Note that this version is quite old considering that pasta is under active development so I suggest you try with the latest version first and see if it works better.

hakong commented 6 months ago

Graph of failure count (left axis) and ratio (right axis) for slirp4netns vs pasta. Given enough time and/or network traffic the failure ratio has reached >60% for me.

image

I had an issue with a tcp/udp mixed high-traffic container using pasta last month that caused it to completely lock up the virtual machine it was running on (this happened a few times before I switched back to slirp4netns) and the flooded the hypervisor host with >3Gbps of traffic. Since it only had a 1Gbit physical interface it self-ddosed and I was unable to debug (including the shared IPMI interface timed out). At one point I was able to connect to the container VM and saw a pasta process using 100% cpu. Might possibly be related.

AFAIK @dgibson is working on some udp fixes in pasta right now. cc @sbrivio-rh

0^20230818.g0af928e

Note that this version is quite old considering that pasta is under active development so I suggest you try with the latest version first and see if it works better.

I'm using the latest stable version included in rhel-9-for-x86_64-appstream-rpms. Is there a repo/rpm package I can install an updated version from?

Luap99 commented 6 months ago

you could try using the static rpm from here: https://passt.top/builds/latest/x86_64/

hakong commented 6 months ago

Are they selinux compatible?

[root@container-2 ~]# rpm -Uvh https://passt.top/builds/latest/x86_64/passt-g3b9098a-1.x86_64.rpm
Retrieving https://passt.top/builds/latest/x86_64/passt-g3b9098a-1.x86_64.rpm
error: Failed dependencies:
        passt = 0^20230818.g0af928e-4.el9 is needed by (installed) passt-selinux-0^20230818.g0af928e-4.el9.noarch

So, remove existing passt and passt-selinux, then install latest static rpm?

Luap99 commented 6 months ago

Sorry I do not know how they are build and never used it, I will refer to @sbrivio-rh in this case

hakong commented 6 months ago

Seems to work:

dnf remove passt passt-selinux
rpm -Uvh https://passt.top/builds/latest/x86_64/passt-g3b9098a-1.x86_64.rpm
sbrivio-rh commented 6 months ago

I'm using the latest stable version included in rhel-9-for-x86_64-appstream-rpms. Is there a repo/rpm package I can install an updated version from?

EPEL 9 and CentOS Stream 9 packages (they should all be compatible with SELinux's base policy) are available from: https://copr.fedorainfracloud.org/coprs/sbrivio/passt/. If the static RPM build works for you, at least for testing purposes, you can use them too.

CentOS Stream already has a rebased package (see https://gitlab.com/redhat/centos-stream/rpms/passt) which includes several fixes for issues similar to what you're observing.

dgibson commented 6 months ago

I strongly suspect this is upstream bug 57. The 2023_08_18 release doesn't have the fix for it and it caused very much the same issue as here: gradually increasing failure rates for UDP traffic the longer the container stayed around.

It was fixed with this commit, which is included in the 2023_11_07 and later releases.

hakong commented 6 months ago

I'm surprised to see such a broken package in RHEL's repos. I suggested to my colleagues at work that we switch our production container hosts to pasta, given it's benefits. Good thing I tested this at home first.

Will the RHEL package be updated soon?

hakong commented 6 months ago

EPEL 9 and CentOS Stream 9 packages (they should all be compatible with SELinux's base policy) are available from: https://copr.fedorainfracloud.org/coprs/sbrivio/passt/. If the static RPM build works for you, at least for testing purposes, you can use them too.

CentOS Stream already has a rebased package (see https://gitlab.co

It installed but with some selinux warnings.


Dependencies resolved.
============================================================================================================================================================================================ 
Package                             Architecture                 Version                                          Repository                                                          Size
============================================================================================================================================================================================
Installing:
 passt                               x86_64                       0^20240220.g1e6f92b-1.el9                        copr:copr.fedorainfracloud.org:sbrivio:passt                       185 k
Installing dependencies:
 passt-selinux                       noarch                       0^20240220.g1e6f92b-1.el9                        copr:copr.fedorainfracloud.org:sbrivio:passt                        32 k

Transaction Summary
============================================================================================================================================================================================
Install  2 Packages

Total download size: 217 k
Installed size: 960 k
Is this ok [y/N]: y
Downloading Packages:
(1/2): passt-selinux-0^20240220.g1e6f92b-1.el9.noarch.rpm                                                                                                    63 kB/s |  32 kB     00:00
(2/2): passt-0^20240220.g1e6f92b-1.el9.x86_64.rpm                                                                                                           296 kB/s | 185 kB     00:00
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Total                                                                                                                                                       344 kB/s | 217 kB     00:00     
Copr repo for passt owned by sbrivio                                                                                                                        4.3 kB/s | 1.0 kB     00:00
Importing GPG key 0xF021CB9A:
 Userid     : "sbrivio_passt (None) <sbrivio#passt@copr.fedorahosted.org>"
 Fingerprint: E351 69FA D8EE 08F6 C0EF F84A F404 8A96 F021 CB9A
 From       : https://download.copr.fedorainfracloud.org/results/sbrivio/passt/pubkey.gpg
Is this ok [y/N]: y
Key imported successfully
Running transaction check
Transaction check succeeded.
Running transaction test
Transaction test succeeded.
Running transaction
  Preparing        :                                                                                                                                                                    
  1/1   Installing       : passt-0^20240220.g1e6f92b-1.el9.x86_64                                                                                                                             
  1/2   Running scriptlet: passt-selinux-0^20240220.g1e6f92b-1.el9.noarch                                                                                                                     
  2/2   Installing       : passt-selinux-0^20240220.g1e6f92b-1.el9.noarch                                                                                                                     
  2/2   Running scriptlet: passt-selinux-0^20240220.g1e6f92b-1.el9.noarch                                                                                                                     
  2/2 Failed to resolve allow statement at /var/lib/selinux/targeted/tmp/modules/200/passt/cil:103
Failed to resolve AST
/usr/sbin/semodule:  Failed!
Failed to resolve allow statement at /var/lib/selinux/targeted/tmp/modules/200/pasta/cil:104
Failed to resolve AST
/usr/sbin/semodule:  Failed!

  Verifying        : passt-0^20240220.g1e6f92b-1.el9.x86_64                                                                                                                             
  1/2   Verifying        : passt-selinux-0^20240220.g1e6f92b-1.el9.noarch                                                                                                                     
  2/2 Installed products updated.

Installed:
  passt-0^20240220.g1e6f92b-1.el9.x86_64                                                   passt-selinux-0^20240220.g1e6f92b-1.el9.noarch

Complete!```
sbrivio-rh commented 6 months ago

I'm surprised to see such a broken package in RHEL's repos. I suggested to my colleagues at work that we switch our production container hosts to pasta, given it's benefits. Good thing I tested this at home first.

It's not broken for the supported, typical use case in RHEL at that point, that is, virtual machines (and passt(1)). However, if that's a priority for you, please file an issue.

Will the RHEL package be updated soon?

Yes, it's already updated and pending release as I mentioned: passt-0^20231204.gb86afe3-1.el9.

Details about the SELinux scriptlet failure you're seeing at: https://bugzilla.redhat.com/show_bug.cgi?id=2237996 -- the base policy installed on your system is too old to support new SELinux rules we added meanwhile. The Fedora 37 package had a patch for that: https://src.fedoraproject.org/rpms/passt/blob/f37/f/0001-selinux-Drop-user_namespace-class-rules-for-Fedora-3.patch.