systemd generated podman units won't have proper network configuration

deajan commented 11 months ago

Issue Description

I'm running a rootless podman container on a RHEL 9 machine. I've created a systemd unit file from the container using podman systemd generate > /etc/systemd/system/mycontainer.service

On every system reboot, my container isn't reachable from the host:

# podman ps
CONTAINER ID  IMAGE                                COMMAND     CREATED         STATUS         PORTS                   NAMES
77f1cfe5c95d  docker.io/vaultwarden/server:latest  /start.sh   42 seconds ago  Up 43 seconds  0.0.0.0:8080->8080/tcp  mycontainer
# curl -m 4 127.0.0.1:8080
curl: (28) Operation timed out after 4002 milliseconds with 0 bytes received

I have to run podman reload network mycontainer for the container to become reachable.

# podman network reload mycontainer
ERRO[0000] tearing down network namespace configuration for container 77f1cfe5c95dcd4a5d04b29d92ac61e0c286248aea1f30a542047f2b573595f7: netavark: code: 1, msg: iptables: No chain/target/match by that name.
77f1cfe5c95dcd4a5d04b29d92ac61e0c286248aea1f30a542047f2b573595f7
# curl -m 4 127.0.0.1
<!doctype html>[...]

Searching around, I noticed that firewalld hasn't got the container network in trused, and iptables hasn't got any forwarding rules until I've reloaded the networks using podman: Before podman network reload mycontainer

# firewall-cmd --list-all -zone=trusted
trusted
  target: ACCEPT
  icmp-block-inversion: no
  interfaces:
  sources:
  services:
  ports:
  protocols:
  forward: yes
  masquerade: no
  forward-ports:
  source-ports:
  icmp-blocks:
  rich rules:
# iptables -L
Chain INPUT (policy ACCEPT)
target     prot opt source               destination

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination

After podman network reload mycontainer

# firewall-cmd --list-all -zone=trusted
trusted (active)
  target: ACCEPT
  icmp-block-inversion: no
  interfaces:
  sources: 10.88.0.0/16
  services:
  ports:
  protocols:
  forward: yes
  masquerade: no
  forward-ports:
  source-ports:
  icmp-blocks:
  rich rules:
# iptables -L
Chain INPUT (policy ACCEPT)
target     prot opt source               destination

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination
NETAVARK_FORWARD  all  --  anywhere             anywhere             /* netavark firewall plugin rules */

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination

Chain NETAVARK_FORWARD (1 references)
target     prot opt source               destination
ACCEPT     all  --  anywhere             10.88.0.0/16         ctstate RELATED,ESTABLISHED
ACCEPT     all  --  10.88.0.0/16         anywhere

So my wild guess is that the podman generated systemd file doesn't provide the necessary steps at boot time.

Steps to reproduce the issue

Create a podman container listening on any port
Create a systemd service using podman generate systemd mycontainer > /etc/systemd/system/mycontainer.service
Enable the service with systemctl enable --now mycontainer
Reboot
Try to connect to the container (should fail)
check if necessaty iptables / firewalld rules are created

Describe the results you received

As described, missing iptables and firewalld rules.

Describe the results you expected

Automatic iptables and firewalld rules creation on service start.

I do understand that running firewall-cmd --reload will reset firewalld rules though, but these seem missing on startup already.

podman info output

host:
  arch: amd64
  buildahVersion: 1.29.0
  cgroupControllers:
  - cpuset
  - cpu
  - io
  - memory
  - hugetlb
  - pids
  - rdma
  - misc
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.1.7-1.el9_2.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.7, commit: fab2fef7227d2dc16478d29f1185953f81451702'
  cpuUtilization:
    idlePercent: 98.18
    systemPercent: 0.98
    userPercent: 0.84
  cpus: 2
  distribution:
    distribution: '"almalinux"'
    version: "9.2"
  eventLogger: journald
  hostname: pw.netperfect.eu
  idMappings:
    gidmap: null
    uidmap: null
  kernel: 5.14.0-284.30.1.el9_2.x86_64
  linkmode: dynamic
  logDriver: journald
  memFree: 3597586432
  memTotal: 4095905792
  networkBackend: netavark
  ociRuntime:
    name: crun
    package: crun-1.8.4-1.el9_2.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 1.8.4
      commit: 5a8fa99a5e41facba2eda4af12fa26313918805b
      rundir: /run/user/0/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL
  os: linux
  remoteSocket:
    exists: true
    path: /run/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: false
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.2.0-3.el9.x86_64
    version: |-
      slirp4netns version 1.2.0
      commit: 656041d45cfca7a4176f6b7eed9e4fe6c11e8383
      libslirp: 4.4.0
      SLIRP_CONFIG_VERSION_MAX: 3
      libseccomp: 2.5.2
  swapFree: 1610608640
  swapTotal: 1610608640
  uptime: 0h 5m 2.00s
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  volume:
  - local
registries:
  search:
  - registry.access.redhat.com
  - registry.redhat.io
  - docker.io
store:
  configFile: /etc/containers/storage.conf
  containerStore:
    number: 1
    paused: 0
    running: 1
    stopped: 0
  graphDriverName: overlay
  graphOptions:
    overlay.mountopt: nodev,metacopy=on
  graphRoot: /var/lib/containers/storage
  graphRootAllocated: 13410238464
  graphRootUsed: 8915861504
  graphStatus:
    Backing Filesystem: xfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "true"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 1
  runRoot: /run/containers/storage
  transientStore: false
  volumePath: /var/lib/containers/storage/volumes
version:
  APIVersion: 4.4.1
  Built: 1694517849
  BuiltTime: Tue Sep 12 13:24:09 2023
  GitCommit: ""
  GoVersion: go1.19.10
  Os: linux
  OsArch: linux/amd64
  Version: 4.4.1

Podman in a container

No

Privileged Or Rootless

Rootless

Upstream Latest Release

No

Additional environment details

Here's my systemd service file generated via podman

# container-mycontainer.service
# autogenerated by Podman 4.4.1
# Wed May 24 22:54:44 CEST 2023

[Unit]
Description=Podman container-mycontainer.service
Documentation=man:podman-generate-systemd(1)
Wants=network-online.target
After=network-online.target
RequiresMountsFor=%t/containers

[Service]
Environment=PODMAN_SYSTEMD_UNIT=%n
Restart=always
TimeoutStopSec=70
ExecStart=/usr/bin/podman run \
        --cidfile=%t/%n.ctr-id \
        --cgroups=no-conmon \
        --rm \
        --sdnotify=conmon \
        --replace \
        -u 1001:1001 \
        -d \
        --name mycontainer\
        -p 8080:8080 \
        -v /data/:/data/:Z \
        --env-file=/data/.env \
        -e TZ=Europe/Paris \
        --label io.containers.autoupdate=registry docker.io/myprovider/server:latest
ExecStop=/usr/bin/podman stop \
        --ignore -t 10 \
        --cidfile=%t/%n.ctr-id
ExecStopPost=/usr/bin/podman rm \
        -f \
        --ignore -t 10 \
        --cidfile=%t/%n.ctr-id
Type=notify
NotifyAccess=all

[Install]
WantedBy=default.target

Additional information

I have two servers like this setup, one with RHEL 9.2 and the other with AlmaLinux 9.2, both running podman 4.4.1. Both servers share the almose exact behavior.

Luap99 commented 11 months ago

Did you make sure your unit is started after firewalld? On start firewalld will also flush all rules so the container unit must be started after that.

deajan commented 11 months ago

I've added After=firewalld.service to my systemd file, and then rebooted the machine. Same issue.

Luap99 commented 11 months ago

Did you actually make sure no other service is flushing the iptables afterwards? Podman is not deleting stuff on its own. You have to figure out what is clearing your iptables and then start the podman units after that. In any case this is not a podman bug so I am closing this but feel free to continue the discussion.

deajan commented 10 months ago

@Luap99 The servers I installed podman are only running podman and mysql so I don't think anything would reload firewalld/iptables rules.

I've pushed the analysis a bit further. Both servers don't show the same behavior actually (sorry, wasn't precise enough).

Server 1: single podman container only, no other software installed (RHEL 9 minimal setup) On reboot, firewalld zone and iptables rules aren't created until I launch podman network reload --all Added After=firewalld.service in the unit file of the podman container. Nothing changes. I have yet to discover if/when the firewalld/iptables rules get flushed. Any idea how to proceed ?
Server 2: 17 podman containers, all running the same software, each with one systemd unit file, all those podman containers speak to a mysql server which is insalled on the same server directly. On reboot, firewalld zone and iptables exist and look good.

Server 2's special hell case: Surprisingly, of the 17 instances, only some cannot be reached. Each instance uses a local port (8001-8017) which is forwarded to the podman instance on port 8080 (eg -P 8002:8080). When I do a curl localhost:8001 up to 8017, some containers cannot don't respond until I launch podman reload network --all. Again, I have no clue where to search. I've looked at the bridge slaves, they are all up. I can also ping each container IP without problem. I just cannot reach random containers on the redirected local port.

EDIT: The containers that cannot be reached aren't random, they're always the same ones. As said, they run the same software as the others, and are configured the same way, except of the local port that changes. I compared the systemd unit files, the only differences being the names and the mapped ports. Checked SELinux logs, nothing. Checked dmesg, only got the generic podman0 port x(vethy) entered forwarding state messages. Container logs show the application inside the container is running. If I happen to podman exec -it <container> bash then execute curl localhost:8080, application responds properly. Interestingly, from inside the container, I can also see the sql server using curl host.containers.internal:3306. Of course, using a HTTP client for an SQL server produces an error (I don't have other tools in the container), but at least I know the SQL server is reachable. Using ss -lataupen, I see the ports that don't work listening. Eg, one container that doesn't work has app port 8080 mapped to port 8006, and of course I see that the local port 8006 listens on the host server:

tcp   LISTEN     26     4096                  0.0.0.0:8006                 0.0.0.0:*     users:(("conmon",pid=1671,fd=6)) ino:26454 sk:b cgroup:/system.slice/demo.service <->
tcp   ESTAB      199    0               192.168.161.1:8006           192.168.201.4:38412 ino:0 sk:1003 cgroup:/system.slice/example.service <->
tcp   CLOSE-WAIT 79     0                   127.0.0.1:8006               127.0.0.1:37688 ino:0 sk:25 cgroup:/system.slice/example.service -->
tcp   CLOSE-WAIT 79     0                   127.0.0.1:8006               127.0.0.1:40564 ino:0 sk:29 cgroup:/system.slice/example.service -->

So each container app works and can reach the host sql server, but some cannot be reached from the host server, without any proper explanation.

I must admit I'm all out of things to check. Any idea perhaps ? I'm pretty lost, as I did all the server-fu that I know.

deajan commented 10 months ago

I've even tried to see whether firewall rules change:

# reboot machine
# good container
curl localhost:8001 -m 4
<!doctype html>[...]
# bad container
curl localhost:8003 -m 4
curl: (28) Operation timed out after 4001 milliseconds with 0 bytes received
ip a > before_ipa
ss -lataupn > before_ss
firewall-cmd --list-all-zones > before_fw
iptables -L --list-all-zones > before_iptables

# Now restart network
podman network reload all

# good container
curl localhost:8001 -m 4
<!doctype html>[...]
# bad container now works too
curl localhost:8003 -m 4
<!doctype html>[...]
ip a > after_ipa
ss -lataupn > after_ss
firewall-cmd --list-all-zones > after_fw
iptables -L --list-all-zones > after_iptables

Running a diff on iptables and firewall files show nothing changed. Other difference analysis didn't show anything shocking to me.

Another thing, after a couple of test reboots, I can confirm that the containers that won't respond are indeed random. Containers that didn't work after reboot now work, and others stopped working until I reload networks with podman.

Really needing advice here.

deajan commented 9 months ago

Ahem... any ideas perhaps ?

deajan commented 8 months ago

Okay, it looks like latest update batch containing the following resolved my issue:

firewalld-1.2.1-1.el9.noarch to firewalld-1.2.5-2.el9_3.noarch
firewalld-filesystem-1.2.1-1.el9.noarch to firewalld-filesystem-1.2.5-2.el9_3.noarch0
netavark-1.7.0-1.el9.x86_64 to netavark-1.7.0-2.el9_3.x86_64 
podman.-4.6.1-5.el9.x86_64 to podman-4.6.1-7.el9_3.x86_64
python3-firewall-1.2.1-1.el9.noarch python3-firewall-1.2.5-2.el9_3.noarch

No idea whether it was a podman or firewalld issue. But yet it works ^^

containers / podman