containers / podman

Podman: A tool for managing OCI containers and pods.
https://podman.io
Apache License 2.0
23.72k stars 2.41k forks source link

Racy systemd integration with RestrictAddressFamilies option #24012

Closed ciandonovan closed 1 month ago

ciandonovan commented 1 month ago

Issue Description

I run my podman containers in unprivileged systemd user services. This has previously worked quite well. I wanted to use the RestrictAddressFamilies= option in systemd.exec to limit the containers to only the families AF_UNIX AF_NETLINK AF_INET AF_INET6.

This is because I run all the containers with --net=host, but want to limit access to a SocketCAN interface to just one container, with the addition of another option RestrictAddressFamilies=AF_CAN, making the allowed set AF_UNIX AF_NETLINK AF_INET AF_INET6 AF_CAN just for that one container.

With this I ran into a very strange issue, where with the RestrictAddressFamilies= option applied, the container fails to start with the message level=error msg="running/usr/bin/newuidmap 2742 0 1000 1 1 100000 65536: newuidmap: write to uid_map failed: Operation not permitted\n"

However, if any single Podman command is run outside of the service, the service immediately starts working. I'm guessing the RestrictAddressFamilies= option is somehow blocking the newuidmap call, but once that's setup once it doesn't need to be run again, and so the container in the restricted service begins to work without issue for the remaining time the system is up. Once rebooted, the issue arises again until another command is run manually.

Steps to reproduce the issue

Steps to reproduce the issue

  1. Create the systemd user service file ~/.config/systemd/user/ros_humble_desktop.service
[Unit]
Description=ROS Podman container for ros_humble_desktop

RequiresMountsFor=%t/containers

[Service]
Type=notify
NotifyAccess=all
Delegate=yes

Environment=PODMAN_SYSTEMD_UNIT=%n

TimeoutStartSec=1h

RestrictAddressFamilies=AF_UNIX AF_NETLINK AF_INET AF_INET6

ExecStartPre=/bin/rm -f %t/%n.ctr-id
ExecStartPre=/usr/bin/podman rm -fi ros_humble_desktop

ExecStart=/usr/bin/podman run --rm --replace --cidfile=%t/%n.ctr-id \
--name ros_humble_desktop \
--sdnotify=conmon \
--pull=newer \
--log-driver=none \
--cgroups=split \
--label "io.containers.autoupdate=image" \
--stop-signal=SIGINT \
--tz=local \
--ipc=host \
--net=host \
--dns=127.0.0.53 \
--annotation run.oci.keep_original_groups=1 \
docker.io/alpine/curl \
curl google.ie

ExecStop=-/usr/bin/podman stop --ignore --cidfile=%t/%n.ctr-id -t=12
ExecStopPost=-/usr/bin/podman rm -fi ros_humble_desktop

TimeoutStopSec=15

RestartSec=5
Restart=always

[Install]
WantedBy=default.target
  1. Enable and run it with systemctl --user daemon-reload && systemctl --user enable --now ros_humble_desktop.service
  2. Monitor the output with journalctl --user-unit ros_humble_desktop.service -f

Describe the results you received

Repeated failure to start the service/container with the below:

Sep 18 23:46:17 VA0002 systemd[707]: Starting ros_humble_desktop.service - ROS Podman container for ros_humble_desktop...
Sep 18 23:46:17 VA0002 podman[3409]: time="2024-09-18T23:46:17+01:00" level=error msg="running `/usr/bin/newuidmap 3421 0 1000 1 1 100000 65536`: newuidmap: write to uid_map failed: Operation not permitted\n"
Sep 18 23:46:17 VA0002 podman[3409]: Error: cannot set up namespace using "/usr/bin/newuidmap": exit status 1
Sep 18 23:46:17 VA0002 systemd[707]: ros_humble_desktop.service: Control process exited, code=exited, status=125/n/a
Sep 18 23:46:17 VA0002 podman[3423]: time="2024-09-18T23:46:17+01:00" level=error msg="running `/usr/bin/newuidmap 3436 0 1000 1 1 100000 65536`: newuidmap: write to uid_map failed: Operation not permitted\n"
Sep 18 23:46:17 VA0002 podman[3423]: Error: cannot set up namespace using "/usr/bin/newuidmap": exit status 1
Sep 18 23:46:17 VA0002 systemd[707]: ros_humble_desktop.service: Failed with result 'exit-code'.
Sep 18 23:46:17 VA0002 systemd[707]: Failed to start ros_humble_desktop.service - ROS Podman container for ros_humble_desktop.

Until I run a command like podman info outside the service, then it works:

Sep 18 23:50:53 VA0002 systemd[713]: Starting ros_humble_desktop.service - ROS Podman container for ros_humble_desktop...
Sep 18 23:51:25 VA0002 podman[1517]: 2024-09-18 23:50:53.29952101 +0100 IST m=+0.020252982 image pull  docker.io/alpine/curl
Sep 18 23:51:25 VA0002 podman[1517]: 
Sep 18 23:51:25 VA0002 podman[1517]: 2024-09-18 23:51:25.334238638 +0100 IST m=+32.054970662 container create 14657ba4435715eeeb9fa177712d60462daf83e4ddac187b8074be333d5193f4 (image=docker.io/alpine/curl:latest, name=ros_humble_desktop, io.containers.autoupdate=image, PODMAN_SYSTEMD_UNIT=ros_humble_desktop.service)
Sep 18 23:51:25 VA0002 podman[1517]: 2024-09-18 23:51:25.402326374 +0100 IST m=+32.123058336 container init 14657ba4435715eeeb9fa177712d60462daf83e4ddac187b8074be333d5193f4 (image=docker.io/alpine/curl:latest, name=ros_humble_desktop, io.containers.autoupdate=image, PODMAN_SYSTEMD_UNIT=ros_humble_desktop.service)
Sep 18 23:51:25 VA0002 systemd[713]: Started ros_humble_desktop.service - ROS Podman container for ros_humble_desktop.
Sep 18 23:51:25 VA0002 podman[1517]: 2024-09-18 23:51:25.406259123 +0100 IST m=+32.126991157 container start 14657ba4435715eeeb9fa177712d60462daf83e4ddac187b8074be333d5193f4 (image=docker.io/alpine/curl:latest, name=ros_humble_desktop, io.containers.autoupdate=image, PODMAN_SYSTEMD_UNIT=ros_humble_desktop.service)
Sep 18 23:51:25 VA0002 podman[1517]: 2024-09-18 23:51:25.406450398 +0100 IST m=+32.127182386 container attach 14657ba4435715eeeb9fa177712d60462daf83e4ddac187b8074be333d5193f4 (image=docker.io/alpine/curl:latest, name=ros_humble_desktop, io.containers.autoupdate=image, PODMAN_SYSTEMD_UNIT=ros_humble_desktop.service)
Sep 18 23:51:25 VA0002 ros_humble_desktop[1532]:   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
Sep 18 23:51:25 VA0002 ros_humble_desktop[1532]:                                  Dload  Upload   Total   Spent    Left  Speed
Sep 18 23:51:25 VA0002 podman[1517]:   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
Sep 18 23:51:25 VA0002 podman[1517]:                                  Dload  Upload   Total   Spent    Left  Speed
Sep 18 23:51:25 VA0002 ros_humble_desktop[1532]: <HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
Sep 18 23:51:25 VA0002 ros_humble_desktop[1532]: <TITLE>301 Moved</TITLE></HEAD><BODY>
Sep 18 23:51:25 VA0002 ros_humble_desktop[1532]: <H1>301 Moved</H1>
Sep 18 23:51:25 VA0002 podman[1517]: [157B blob data]
Sep 18 23:51:25 VA0002 podman[1517]: <TITLE>301 Moved</TITLE></HEAD><BODY>
Sep 18 23:51:25 VA0002 podman[1517]: <H1>301 Moved</H1>
Sep 18 23:51:25 VA0002 podman[1517]: The document has moved
Sep 18 23:51:25 VA0002 podman[1517]: <A HREF="http://www.google.ie/">here</A>.
Sep 18 23:51:25 VA0002 podman[1517]: </BODY></HTML>
Sep 18 23:51:25 VA0002 podman[1517]: [79B blob data]
Sep 18 23:51:25 VA0002 ros_humble_desktop[1532]: The document has moved
Sep 18 23:51:25 VA0002 ros_humble_desktop[1532]: <A HREF="http://www.google.ie/">here</A>.
Sep 18 23:51:25 VA0002 ros_humble_desktop[1532]: </BODY></HTML>
Sep 18 23:51:25 VA0002 ros_humble_desktop[1532]: [159B blob data]
Sep 18 23:51:25 VA0002 podman[1517]: 2024-09-18 23:51:25.624007594 +0100 IST m=+32.344739737 container died 14657ba4435715eeeb9fa177712d60462daf83e4ddac187b8074be333d5193f4 (image=docker.io/alpine/curl:latest, name=ros_humble_desktop, io.containers.autoupdate=image, PODMAN_SYSTEMD_UNIT=ros_humble_desktop.service)
Sep 18 23:51:25 VA0002 podman[1536]: 2024-09-18 23:51:25.69680878 +0100 IST m=+0.054917710 container remove 14657ba4435715eeeb9fa177712d60462daf83e4ddac187b8074be333d5193f4 (image=docker.io/alpine/curl:latest, name=ros_humble_desktop, PODMAN_SYSTEMD_UNIT=ros_humble_desktop.service, io.containers.autoupdate=image)
Sep 18 23:51:30 VA0002 systemd[713]: ros_humble_desktop.service: Scheduled restart job, restart counter is at 14.
Sep 18 23:51:30 VA0002 systemd[713]: Stopped ros_humble_desktop.service - ROS Podman container for ros_humble_desktop.

https://github.com/user-attachments/assets/70e33234-b1f4-4727-96d1-6f8c0780a8b1

Describe the results you expected

Sep 18 23:50:53 VA0002 systemd[713]: Starting ros_humble_desktop.service - ROS Podman container for ros_humble_desktop...
Sep 18 23:51:25 VA0002 podman[1517]: 2024-09-18 23:50:53.29952101 +0100 IST m=+0.020252982 image pull  docker.io/alpine/curl
Sep 18 23:51:25 VA0002 podman[1517]: 
Sep 18 23:51:25 VA0002 podman[1517]: 2024-09-18 23:51:25.334238638 +0100 IST m=+32.054970662 container create 14657ba4435715eeeb9fa177712d60462daf83e4ddac187b8074be333d5193f4 (image=docker.io/alpine/curl:latest, name=ros_humble_desktop, io.containers.autoupdate=image, PODMAN_SYSTEMD_UNIT=ros_humble_desktop.service)
Sep 18 23:51:25 VA0002 podman[1517]: 2024-09-18 23:51:25.402326374 +0100 IST m=+32.123058336 container init 14657ba4435715eeeb9fa177712d60462daf83e4ddac187b8074be333d5193f4 (image=docker.io/alpine/curl:latest, name=ros_humble_desktop, io.containers.autoupdate=image, PODMAN_SYSTEMD_UNIT=ros_humble_desktop.service)
Sep 18 23:51:25 VA0002 systemd[713]: Started ros_humble_desktop.service - ROS Podman container for ros_humble_desktop.
Sep 18 23:51:25 VA0002 podman[1517]: 2024-09-18 23:51:25.406259123 +0100 IST m=+32.126991157 container start 14657ba4435715eeeb9fa177712d60462daf83e4ddac187b8074be333d5193f4 (image=docker.io/alpine/curl:latest, name=ros_humble_desktop, io.containers.autoupdate=image, PODMAN_SYSTEMD_UNIT=ros_humble_desktop.service)
Sep 18 23:51:25 VA0002 podman[1517]: 2024-09-18 23:51:25.406450398 +0100 IST m=+32.127182386 container attach 14657ba4435715eeeb9fa177712d60462daf83e4ddac187b8074be333d5193f4 (image=docker.io/alpine/curl:latest, name=ros_humble_desktop, io.containers.autoupdate=image, PODMAN_SYSTEMD_UNIT=ros_humble_desktop.service)
Sep 18 23:51:25 VA0002 ros_humble_desktop[1532]:   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
Sep 18 23:51:25 VA0002 ros_humble_desktop[1532]:                                  Dload  Upload   Total   Spent    Left  Speed
Sep 18 23:51:25 VA0002 podman[1517]:   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
Sep 18 23:51:25 VA0002 podman[1517]:                                  Dload  Upload   Total   Spent    Left  Speed
Sep 18 23:51:25 VA0002 ros_humble_desktop[1532]: <HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
Sep 18 23:51:25 VA0002 ros_humble_desktop[1532]: <TITLE>301 Moved</TITLE></HEAD><BODY>
Sep 18 23:51:25 VA0002 ros_humble_desktop[1532]: <H1>301 Moved</H1>
Sep 18 23:51:25 VA0002 podman[1517]: [157B blob data]
Sep 18 23:51:25 VA0002 podman[1517]: <TITLE>301 Moved</TITLE></HEAD><BODY>
Sep 18 23:51:25 VA0002 podman[1517]: <H1>301 Moved</H1>
Sep 18 23:51:25 VA0002 podman[1517]: The document has moved
Sep 18 23:51:25 VA0002 podman[1517]: <A HREF="http://www.google.ie/">here</A>.
Sep 18 23:51:25 VA0002 podman[1517]: </BODY></HTML>
Sep 18 23:51:25 VA0002 podman[1517]: [79B blob data]
Sep 18 23:51:25 VA0002 ros_humble_desktop[1532]: The document has moved
Sep 18 23:51:25 VA0002 ros_humble_desktop[1532]: <A HREF="http://www.google.ie/">here</A>.
Sep 18 23:51:25 VA0002 ros_humble_desktop[1532]: </BODY></HTML>
Sep 18 23:51:25 VA0002 ros_humble_desktop[1532]: [159B blob data]
Sep 18 23:51:25 VA0002 podman[1517]: 2024-09-18 23:51:25.624007594 +0100 IST m=+32.344739737 container died 14657ba4435715eeeb9fa177712d60462daf83e4ddac187b8074be333d5193f4 (image=docker.io/alpine/curl:latest, name=ros_humble_desktop, io.containers.autoupdate=image, PODMAN_SYSTEMD_UNIT=ros_humble_desktop.service)
Sep 18 23:51:25 VA0002 podman[1536]: 2024-09-18 23:51:25.69680878 +0100 IST m=+0.054917710 container remove 14657ba4435715eeeb9fa177712d60462daf83e4ddac187b8074be333d5193f4 (image=docker.io/alpine/curl:latest, name=ros_humble_desktop, PODMAN_SYSTEMD_UNIT=ros_humble_desktop.service, io.containers.autoupdate=image)
Sep 18 23:51:30 VA0002 systemd[713]: ros_humble_desktop.service: Scheduled restart job, restart counter is at 14.
Sep 18 23:51:30 VA0002 systemd[713]: Stopped ros_humble_desktop.service - ROS Podman container for ros_humble_desktop.

podman info output

host:
  arch: amd64
  buildahVersion: 1.28.2
  cgroupControllers:
  - cpu
  - memory
  - pids
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon_2.1.6+ds1-1_amd64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.6, commit: unknown'
  cpuUtilization:
    idlePercent: 98.68
    systemPercent: 1.09
    userPercent: 0.23
  cpus: 8
  distribution:
    codename: bookworm
    distribution: debian
    version: "12"
  eventLogger: journald
  hostname: VA0002
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1001
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
  kernel: 6.1.0-25-amd64
  linkmode: dynamic
  logDriver: journald
  memFree: 15472570368
  memTotal: 16051056640
  networkBackend: netavark
  ociRuntime:
    name: crun
    package: crun_1.8.1-1+deb12u1_amd64
    path: /usr/bin/crun
    version: |-
      crun version 1.8.1
      commit: f8a096be060b22ccd3d5f3ebe44108517fbf6c30
      rundir: /run/user/1000/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +YAJL
  os: linux
  remoteSocket:
    path: /run/user/1000/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: false
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns_1.2.0-1_amd64
    version: |-
      slirp4netns version 1.2.0
      commit: 656041d45cfca7a4176f6b7eed9e4fe6c11e8383
      libslirp: 4.7.0
      SLIRP_CONFIG_VERSION_MAX: 4
      libseccomp: 2.5.4
  swapFree: 1027600384
  swapTotal: 1027600384
  uptime: 0h 5m 2.00s
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  volume:
  - local
registries:
  search:
  - docker.io
store:
  configFile: /home/robot/.config/containers/storage.conf
  containerStore:
    number: 0
    paused: 0
    running: 0
    stopped: 0
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /home/robot/.local/share/containers/storage
  graphRootAllocated: 248857907200
  graphRootUsed: 90810155008
  graphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 237
  runRoot: /run/user/1000/containers
  volumePath: /home/robot/.local/share/containers/storage/volumes
version:
  APIVersion: 4.3.1
  Built: 0
  BuiltTime: Thu Jan  1 01:00:00 1970
  GitCommit: ""
  GoVersion: go1.19.8
  Os: linux
  OsArch: linux/amd64
  Version: 4.3.1

Podman in a container

No

Privileged Or Rootless

Rootless

Upstream Latest Release

No

Additional environment details

Running inside a unprivileged systemd user service on Debian Linux 12 Bookworm.

Additional information

Only happens before any other Podman commands are run on the host, then it begins and stays working until reboot.

ciandonovan commented 1 month ago

Related https://github.com/containers/podman/discussions/14311

eriksjolund commented 1 month ago

Related:

Luap99 commented 1 month ago

Yes you need something to setup the pause process first if you run with NoNewPrivileges set so this is expected https://github.com/containers/podman/discussions/14404