containers / podman

Podman: A tool for managing OCI containers and pods.
https://podman.io
Apache License 2.0
23.83k stars 2.42k forks source link

User-level podman-restart.service does not work properly on reboot #22451

Closed urbenlegend closed 6 months ago

urbenlegend commented 7 months ago

Issue Description

I have several rootless containers created via docker-compose files. The user running the containers has linger enabled. I am trying to use the user-level podman-restart.service to start and stop my containers on reboot. Currently it is delaying shutdown and also failing to start my containers back up on system boot.

Here are the podman-restart logs on shutdown:

Apr 20 17:11:46 arch-nas systemd[1125]: Starting Podman Start All Containers With Restart Policy Set To Always...
Apr 20 17:11:46 arch-nas podman[1143]: time="2024-04-20T17:11:46-07:00" level=info msg="/usr/bin/podman filtering at log level info"
Apr 20 17:11:46 arch-nas podman[1143]: time="2024-04-20T17:11:46-07:00" level=info msg="Using sqlite as database backend"
Apr 20 17:11:46 arch-nas podman[1165]: time="2024-04-20T17:11:46-07:00" level=info msg="/usr/bin/podman filtering at log level info"
Apr 20 17:09:32 arch-nas sh[666533]: Failed to start transient scope unit: Transaction for run-r6e5199038ff542a5bb0ceb92f7c0e2ed.scope/start is destructive (shutdown.target has 'start' job queued, but 'stop' is included in trans>
Apr 20 17:09:32 arch-nas sh[666532]: [INFO  netavark::firewall] Using iptables firewall driver
Apr 20 17:09:32 arch-nas podman[665754]: 2024-04-20 17:09:32.335366315 -0700 PDT m=+0.412728883 container cleanup 438044e1026dfb77131ae1bb79edfa3ebf35cc23c5c079dccb04645178a6f2ef (image=docker.io/jellyfin/jellyfin:latest, name=t>
Apr 20 17:09:35 arch-nas podman[665754]: 2024-04-20 17:09:35.993303932 -0700 PDT m=+4.070666520 container died 02d8e7cebf60ae3ed71d71974e5a27e6e598d9040dacf91b79d93cd43c6e125f (image=docker.io/jc21/nginx-proxy-manager:latest, na>
Apr 20 17:09:36 arch-nas podman[665754]: 2024-04-20 17:09:36.024003389 -0700 PDT m=+4.101365967 container exec_died 02d8e7cebf60ae3ed71d71974e5a27e6e598d9040dacf91b79d93cd43c6e125f (image=docker.io/jc21/nginx-proxy-manager:lates>
Apr 20 17:09:36 arch-nas sh[666568]: Failed to start transient scope unit: Transaction for run-r6aaf2cdb306d4fe0ac39fcd7bc015d09.scope/start is destructive (shutdown.target has 'start' job queued, but 'stop' is included in trans>
Apr 20 17:09:36 arch-nas sh[666567]: [INFO  netavark::firewall] Using iptables firewall driver
Apr 20 17:09:36 arch-nas sh[666567]: [INFO  netavark::network::bridge] removing bridge podman3
Apr 20 17:09:36 arch-nas sh[666567]: [INFO  netavark::network::bridge] removing bridge podman2
Apr 20 17:09:37 arch-nas sh[666567]: [INFO  netavark::network::bridge] removing bridge podman1
Apr 20 17:09:37 arch-nas podman[665754]: 2024-04-20 17:09:37.698012085 -0700 PDT m=+5.775374654 container cleanup 02d8e7cebf60ae3ed71d71974e5a27e6e598d9040dacf91b79d93cd43c6e125f (image=docker.io/jc21/nginx-proxy-manager:latest,>
Apr 20 17:09:41 arch-nas sh[665754]: time="2024-04-20T17:09:41-07:00" level=warning msg="StopSignal SIGWINCH failed to stop container nextcloud-cron-1 in 10 seconds, resorting to SIGKILL"
Apr 20 17:09:41 arch-nas podman[665754]: 2024-04-20 17:09:41.968260582 -0700 PDT m=+10.045623170 container died 2fc11d6b2020c41698d36994278ed9ca4442c799172ab32b0921c4c37079aabd (image=docker.io/library/nextcloud:apache, name=nex>
Apr 20 17:09:41 arch-nas sh[666691]: Failed to start transient scope unit: Transaction for run-r7edf2d5dc45c424d87a8189b5719f014.scope/start is destructive (systemd-exit.service has 'start' job queued, but 'stop' is included in >
Apr 20 17:09:41 arch-nas sh[666690]: [INFO  netavark::firewall] Using iptables firewall driver
Apr 20 17:09:42 arch-nas sh[666690]: [INFO  netavark::network::bridge] removing bridge podman4
Apr 20 17:09:42 arch-nas podman[665754]: 2024-04-20 17:09:42.43743908 -0700 PDT m=+10.514801658 container cleanup 2fc11d6b2020c41698d36994278ed9ca4442c799172ab32b0921c4c37079aabd (image=docker.io/library/nextcloud:apache, name=n>
Apr 20 17:09:42 arch-nas sh[665754]: 2fc11d6b2020
Apr 20 17:09:42 arch-nas sh[665754]: 02d8e7cebf60
Apr 20 17:09:42 arch-nas sh[665754]: 438044e1026d
Apr 20 17:09:42 arch-nas systemd[1106]: Stopped Podman Start All Containers With Restart Policy Set To Always.

It seems like it's attempting start the containers back up even though the system is shutting down?

Here are the podman-restart logs on boot:

Apr 20 17:11:46 arch-nas systemd[1125]: Starting Podman Start All Containers With Restart Policy Set To Always...
Apr 20 17:11:46 arch-nas podman[1143]: time="2024-04-20T17:11:46-07:00" level=info msg="/usr/bin/podman filtering at log level info"
Apr 20 17:11:46 arch-nas podman[1143]: time="2024-04-20T17:11:46-07:00" level=info msg="Using sqlite as database backend"
Apr 20 17:11:46 arch-nas podman[1165]: time="2024-04-20T17:11:46-07:00" level=info msg="/usr/bin/podman filtering at log level info"
Apr 20 17:11:46 arch-nas podman[1165]: time="2024-04-20T17:11:46-07:00" level=info msg="Using sqlite as database backend"
Apr 20 17:11:46 arch-nas podman[1165]: 2024-04-20 17:11:46.532300494 -0700 PDT m=+0.274005567 system refresh
Apr 20 17:11:46 arch-nas podman[1165]: time="2024-04-20T17:11:46-07:00" level=info msg="Setting parallel job count to 37"
Apr 20 17:11:46 arch-nas podman[1165]: time="2024-04-20T17:11:46-07:00" level=warning msg="Requested oom_score_adj=0 is lower than the current one, changing to 200"
Apr 20 17:11:46 arch-nas podman[1165]: time="2024-04-20T17:11:46-07:00" level=info msg="Got Conmon PID as 1199"
Apr 20 17:11:46 arch-nas pasta[1206]: External interface not usable
Apr 20 17:11:46 arch-nas podman[1165]: 2024-04-20 17:11:46.622435217 -0700 PDT m=+0.364140280 container init 4fffd4fdd91f7d8f0cc63f79b0d41c830b512d403bacedd1daf9c91dd5d48a38 (image=docker.io/jellyfin/jellyfin:latest, name=jellyf>
Apr 20 17:11:46 arch-nas podman[1165]: time="2024-04-20T17:11:46-07:00" level=error msg="rootless netns ref counter out of sync, counter is at -1, resetting it back to 0"
Apr 20 17:11:46 arch-nas pasta[1207]: External interface not usable
Apr 20 17:11:46 arch-nas podman[1165]: time="2024-04-20T17:11:46-07:00" level=error msg="Unable to clean up network for container 4fffd4fdd91f7d8f0cc63f79b0d41c830b512d403bacedd1daf9c91dd5d48a38: \"2 errors occurred:\\n\\t* sett>
Apr 20 17:11:46 arch-nas conmon[1199]: conmon 4fffd4fdd91f7d8f0cc6 <ninfo>: container 1202 exited with status 137
Apr 20 17:11:46 arch-nas pasta[1221]: External interface not usable
Apr 20 17:11:46 arch-nas pasta[1224]: External interface not usable
Apr 20 17:11:46 arch-nas pasta[1226]: External interface not usable
Apr 20 17:11:46 arch-nas pasta[1228]: External interface not usable
Apr 20 17:11:46 arch-nas pasta[1239]: External interface not usable
Apr 20 17:11:46 arch-nas podman[1165]: time="2024-04-20T17:11:46-07:00" level=warning msg="Requested oom_score_adj=0 is lower than the current one, changing to 200"
Apr 20 17:11:46 arch-nas conmon[1241]: conmon 438044e1026dfb77131a <nwarn>: runtime stderr: cannot stat `/external/P`: No such file or directory
Apr 20 17:11:46 arch-nas conmon[1241]: conmon 438044e1026dfb77131a <error>: Failed to create container: exit status 1
Apr 20 17:11:46 arch-nas podman[1165]: Error: unable to start container "4fffd4fdd91f7d8f0cc63f79b0d41c830b512d403bacedd1daf9c91dd5d48a38": setting up Pasta: pasta failed with exit code 1:
Apr 20 17:11:46 arch-nas podman[1165]: External interface not usable
Apr 20 17:11:46 arch-nas podman[1165]: Error: unable to start container "b88a10a07cd9365378fb94a0952418ab0704b43afb35f3a55073ce99a97757a4": setting up Pasta: pasta failed with exit code 1:
Apr 20 17:11:46 arch-nas podman[1165]: External interface not usable
Apr 20 17:11:46 arch-nas podman[1165]: Error: unable to start container "a7268ef1485ee512e800e942f3e3e539289b402533118e78f990cea321ca4ab9": setting up Pasta: pasta failed with exit code 1:
Apr 20 17:11:46 arch-nas podman[1165]: External interface not usable
Apr 20 17:11:46 arch-nas podman[1165]: Error: unable to start container "2fc11d6b2020c41698d36994278ed9ca4442c799172ab32b0921c4c37079aabd": setting up Pasta: pasta failed with exit code 1:
Apr 20 17:11:46 arch-nas podman[1165]: External interface not usable
Apr 20 17:11:46 arch-nas podman[1165]: Error: unable to start container "18edb4be2665dda849696fad7382c7c6db6f132fb0ff9efc41a6a946d046529f": setting up Pasta: pasta failed with exit code 1:
Apr 20 17:11:46 arch-nas podman[1165]: External interface not usable
Apr 20 17:11:46 arch-nas podman[1165]: Error: unable to start container "02d8e7cebf60ae3ed71d71974e5a27e6e598d9040dacf91b79d93cd43c6e125f": setting up Pasta: pasta failed with exit code 1:
Apr 20 17:11:46 arch-nas podman[1165]: External interface not usable
Apr 20 17:11:46 arch-nas podman[1165]: Error: unable to start container "438044e1026dfb77131ae1bb79edfa3ebf35cc23c5c079dccb04645178a6f2ef": crun: cannot stat `/external`: No such file or directory: OCI runtime attempted to inv>
Apr 20 17:11:46 arch-nas systemd[1125]: podman-restart.service: Main process exited, code=exited, status=125/n/a
Apr 20 17:11:46 arch-nas systemd[1125]: podman-restart.service: Failed with result 'exit-code'.
Apr 20 17:11:46 arch-nas systemd[1125]: Failed to start Podman Start All Containers With Restart Policy Set To Always.

Here it seems like the network isn't available when the containers start up. It is also complaining about not finding one of my drive mounts. Shouldn't podman-restart.service wait until network and mounts are up before attempting startup? If I manually restart podman-restart, then everything works fine, indicating some race condition on boot.

Steps to reproduce the issue

Steps to reproduce the issue

  1. Enable the user-level podman-restart.service.
  2. Enable linger for the podman user
  3. Create a few rootless containers using docker-compose
  4. Reboot

Describe the results you received

System shutdown is sometimes delayed by podman-restart. Containers also fail to start back up due to network interface not being up.

Describe the results you expected

Podman should neatly shutdown all containers before reboot. Containers should also automatically start backup on boot.

podman info output

host:
  arch: amd64
  buildahVersion: 1.35.3
  cgroupControllers:
  - cpu
  - memory
  - pids
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: /usr/bin/conmon is owned by conmon 1:2.1.10-1
    path: /usr/bin/conmon
    version: 'conmon version 2.1.10, commit: 2dcd736e46ded79a53339462bc251694b150f870'
  cpuUtilization:
    idlePercent: 99.39
    systemPercent: 0.29
    userPercent: 0.32
  cpus: 12
  databaseBackend: sqlite
  distribution:
    distribution: arch
    version: unknown
  eventLogger: journald
  freeLocks: 2037
  hostname: arch-nas
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1002
      size: 1
    - container_id: 1
      host_id: 165536
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1001
      size: 1
    - container_id: 1
      host_id: 165536
      size: 65536
  kernel: 6.6.28-1-lts
  linkmode: dynamic
  logDriver: journald
  memFree: 28629639168
  memTotal: 32990961664
  networkBackend: netavark
  networkBackendInfo:
    backend: netavark
    dns:
      package: /usr/lib/podman/aardvark-dns is owned by aardvark-dns 1.10.0-1
      path: /usr/lib/podman/aardvark-dns
      version: aardvark-dns 1.10.0
    package: /usr/lib/podman/netavark is owned by netavark 1.10.3-1
    path: /usr/lib/podman/netavark
    version: netavark 1.10.3
  ociRuntime:
    name: crun
    package: /usr/bin/crun is owned by crun 1.14.4-1
    path: /usr/bin/crun
    version: |-
      crun version 1.14.4
      commit: a220ca661ce078f2c37b38c92e66cf66c012d9c1
      rundir: /run/user/1001/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL
  os: linux
  pasta:
    executable: /usr/bin/pasta
    package: /usr/bin/pasta is owned by passt 2024_04_05.954589b-1
    version: |
      pasta 2024_04_05.954589b
      Copyright Red Hat
      GNU General Public License, version 2 or later
        <https://www.gnu.org/licenses/old-licenses/gpl-2.0.html>
      This is free software: you are free to change and redistribute it.
      There is NO WARRANTY, to the extent permitted by law.
  remoteSocket:
    exists: true
    path: /run/user/1001/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /etc/containers/seccomp.json
    selinuxEnabled: false
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: /usr/bin/slirp4netns is owned by slirp4netns 1.2.3-1
    version: |-
      slirp4netns version 1.2.3
      commit: c22fde291bb35b354e6ca44d13be181c76a0a432
      libslirp: 4.7.0
      SLIRP_CONFIG_VERSION_MAX: 4
      libseccomp: 2.5.5
  swapFree: 17179865088
  swapTotal: 17179865088
  uptime: 0h 16m 34.00s
  variant: ""
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries: {}
store:
  configFile: /home/selfhost/.config/containers/storage.conf
  containerStore:
    number: 7
    paused: 0
    running: 7
    stopped: 0
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /home/selfhost/.local/share/containers/storage
  graphRootAllocated: 238342373376
  graphRootUsed: 58961645568
  graphStatus:
    Backing Filesystem: btrfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Supports shifting: "false"
    Supports volatile: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 5
  runRoot: /tmp/storage-run-1001/containers
  transientStore: false
  volumePath: /home/selfhost/.local/share/containers/storage/volumes
version:
  APIVersion: 5.0.2
  Built: 1713438799
  BuiltTime: Thu Apr 18 04:13:19 2024
  GitCommit: 3304dd95b8978a8346b96b7d43134990609b3b29-dirty
  GoVersion: go1.22.2
  Os: linux
  OsArch: linux/amd64
  Version: 5.0.2

Podman in a container

No

Privileged Or Rootless

Rootless

Upstream Latest Release

Yes

Additional environment details

Additional environment details

Additional information

Additional information like issue happens only occasionally or issue happens with a particular architecture or on a particular setting

rhatdan commented 6 months ago

@Luap99 Likely not waiting for network.target?

Luap99 commented 6 months ago

A user unit cannot wait for network-online.target, this look like the same underlying issue as https://github.com/containers/podman/issues/22197 for the startup on boot.

As for stopping the log clearly shows that the issue is with your container, no idea why you configured SIGWINCH as stop signal because that is not what it is used for normally.

level=warning msg="StopSignal SIGWINCH failed to stop container nextcloud-cron-1 in 10 seconds, resorting to SIGKILL"

Closing as dup of #22197

urbenlegend commented 6 months ago

@Luap99 Thanks for the info and dup. I did not configure the SIGWINCH, I am just using the container defined in the Nextcloud docker example: https://github.com/nextcloud/docker/blob/master/.examples/docker-compose/insecure/mariadb/apache/docker-compose.yml