containers / podman

Podman: A tool for managing OCI containers and pods.
https://podman.io
Apache License 2.0
23.83k stars 2.42k forks source link

Podman does not remove the overlay storage when the systemd service is restarted during reboot or shutdown (still in v5.1.1) #23504

Open rugk opened 3 months ago

rugk commented 3 months ago

Issue Description

Basically, https://github.com/containers/podman/issues/19913 aka https://github.com/containers/podman/issues/21093 aka https://github.com/containers/podman/issues/19491 still happens on my system.

Steps to reproduce the issue

Any podman container/podman-compose command with volumes:

$ $ systemctl --user cat nextcloud-workaround
# **HOME_DIR***/.config/systemd/user/nextcloud-workaround.service
[Unit]
Description=Nextcloud workaround service
Wants=network-online.target
After=network-online.target
RequiresMountsFor=%t/containers

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStartPre=/bin/sleep 30
ExecStartPre=/bin/bash -c "cd ~/****/nextcloud&&podman-compose pull"
ExecStart=/bin/bash -c "cd ~/****/nextcloud&&podman-compose -p nextcloud down&&podman-compose --in-pod=0 -p nextcloud up -d"
ExecStop=/bin/bash -c "cd ~/****/nextcloud&&podman-compose -p nextcloud down"
Restart=on-failure

[Install]
WantedBy=default.target

# /usr/lib/systemd/user/service.d/10-timeout-abort.conf
# This file is part of the systemd package.
# See https://fedoraproject.org/wiki/Changes/Shorter_Shutdown_Timer.
#
# To facilitate debugging when a service fails to stop cleanly,
# TimeoutStopFailureMode=abort is set to "crash" services that fail to stop in
# the time allotted. This will cause the service to be terminated with SIGABRT
# and a coredump to be generated.
#
# To undo this configuration change, create a mask file:
#   sudo mkdir -p /etc/systemd/user/service.d
#   sudo ln -sv /dev/null /etc/systemd/user/service.d/10-timeout-abort.conf

[Service]
TimeoutStopFailureMode=abort

I already added ExecStartPre thinking it might solve the problem, it does not…

Describe the results you received

If you reboot this system, at the next restart, the container does not come up and I get the well-known errors (when running podman-compose down):

WARN[0000] Unmounting container "nextcloud_db_1" while attempting to delete storage: replacing mount point "/var/home/c-nextcloud/.local/share/containers/storage/overlay/679b0b96e3f7966294fa76e8a2354ad861d28fd5f6976e7849210d20d81c57dd/merged": directory not empty 
Error: removing storage for container "nextcloud_db_1": replacing mount point "/var/home/c-nextcloud/.local/share/containers/storage/overlay/679b0b96e3f7966294fa76e8a2354ad861d28fd5f6976e7849210d20d81c57dd/merged": directory not empty
Backup/move /var/home/c-nextcloud/.local/share/containers/storage/overlay/679b0b96e3f7966294fa76e8a2354ad861d28fd5f6976e7849210d20d81c57dd/merged directory

My own script from https://github.com/containers/podman/issues/19913#issuecomment-1750658431 still solves the issue… (that's part of the log output of it) but it is supposed to be fixed in v4.7 in podman.

Notably, it has never worked on my system (and AFAIK I have tried all updated versions that have been pushed throzgh CoreOS).

Describe the results you expected

Just stop the container.

podman info output

podman info
host:
  arch: amd64
  buildahVersion: 1.36.0
  cgroupControllers:
  - cpu
  - memory
  - pids
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.1.10-1.fc40.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.10, commit: '
  cpuUtilization:
    idlePercent: 97.43
    systemPercent: 1.38
    userPercent: 1.19
  cpus: 4
  databaseBackend: boltdb
  distribution:
    distribution: fedora
    variant: coreos
    version: "40"
  eventLogger: journald
  freeLocks: 430
  hostname: minipure
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1002
      size: 1
    - container_id: 1
      host_id: 231072
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1002
      size: 1
    - container_id: 1
      host_id: 231072
      size: 65536
  kernel: 6.9.7-200.fc40.x86_64
  linkmode: dynamic
  logDriver: journald
  memFree: 62498131968
  memTotal: 67289849856
  networkBackend: netavark
  networkBackendInfo:
    backend: netavark
    dns:
      package: aardvark-dns-1.11.0-1.fc40.x86_64
      path: /usr/libexec/podman/aardvark-dns
      version: aardvark-dns 1.11.0
    package: netavark-1.11.0-1.fc40.x86_64
    path: /usr/libexec/podman/netavark
    version: netavark 1.11.0
  ociRuntime:
    name: crun
    package: crun-1.15-1.fc40.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 1.15
      commit: e6eacaf4034e84185fd8780ac9262bbf57082278
      rundir: /run/user/1002/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +LIBKRUN +WASM:wasmedge +YAJL
  os: linux
  pasta:
    executable: /usr/bin/pasta
    package: passt-0^20240624.g1ee2eca-1.fc40.x86_64
    version: |
      pasta 0^20240624.g1ee2eca-1.fc40.x86_64
      Copyright Red Hat
      GNU General Public License, version 2 or later
        <https://www.gnu.org/licenses/old-licenses/gpl-2.0.html>
      This is free software: you are free to change and redistribute it.
      There is NO WARRANTY, to the extent permitted by law.
  remoteSocket:
    exists: false
    path: /run/user/1002/podman/podman.sock
  rootlessNetworkCmd: pasta
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.2.2-2.fc40.x86_64
    version: |-
      slirp4netns version 1.2.2
      commit: 0ee2d87523e906518d34a6b423271e4826f71faf
      libslirp: 4.7.0
      SLIRP_CONFIG_VERSION_MAX: 4
      libseccomp: 2.5.5
  swapFree: 4294963200
  swapTotal: 4294963200
  uptime: 0h 50m 23.00s
  variant: ""
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  search:
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - docker.io
store:
  configFile: **HOME_DIR***/.config/containers/storage.conf
  containerStore:
    number: 10
    paused: 0
    running: 3
    stopped: 7
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: **HOME_DIR***/.local/share/containers/storage
  graphRootAllocated: 999650168832
  graphRootUsed: 52357189632
  graphStatus:
    Backing Filesystem: btrfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Supports shifting: "true"
    Supports volatile: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 31
  runRoot: /run/user/1002/containers
  transientStore: false
  volumePath: **HOME_DIR***/.local/share/containers/storage/volumes
version:
  APIVersion: 5.1.1
  Built: 1717459200
  BuiltTime: Tue Jun  4 02:00:00 2024
  GitCommit: ""
  GoVersion: go1.22.3
  Os: linux
  OsArch: linux/amd64
  Version: 5.1.1

Podman in a container

No

Privileged Or Rootless

Rootless

Upstream Latest Release

No

Additional environment details

$ rpm-ostree status -b
State: idle
AutomaticUpdatesDriver: Zincati
  DriverState: active; periodically polling for updates (last checked Mon 2024-08-05 14:00:10 UTC)
BootedDeployment:
● fedora:fedora/x86_64/coreos/stable
                  Version: 40.20240709.3.1 (2024-07-29T18:52:14Z)
               BaseCommit: 0e10a21afc41b591b6eae884fc7d2a18f18aa93c92f47e1f2c53691db46102e8
             GPGSignature: Valid signature by 115DF9AEF857853EE8445D0A0727707EA15B79CC
          LayeredPackages: *** podman-compose

Additional information

No response

giuseppe commented 3 months ago

what is in the /var/home/c-nextcloud/.local/share/containers/storage/overlay/679b0b96e3f7966294fa76e8a2354ad861d28fd5f6976e7849210d20d81c57dd/merged directory after you reboot the system (and before you attempt any podman command)?

github-actions[bot] commented 2 months ago

A friendly reminder that this issue had no activity for 30 days.

rugk commented 1 month ago

Sorry for the delay, it took some time until I could get this…

So e.g. I get this:

WARN[0000] Unmounting container "nextcloud_redis_1" while attempting to delete storage: replacing mount point "/var/home/c-nextcloud/.local/share/containers/storage/overlay/d3a5f0ffaecee9c32bf022e9ad0652ae314ff08e058f09d79a05df69963b93b8/merged": directory not empty 
Error: removing storage for container "nextcloud_redis_1": replacing mount point "/var/home/c-nextcloud/.local/share/containers/storage/overlay/d3a5f0ffaecee9c32bf022e9ad0652ae314ff08e058f09d79a05df69963b93b8/merged": directory not empty

It may absolutely be related to an unclean shutdown. I mean I just did reboot, thinking this closes everything… :upside_down_face:

So as for your question:

$ ls -la /var/home/c-nextcloud/.local/share/containers/storage/overlay/d3a5f0ffaecee9c32bf022e9ad0652ae314ff08e058f09d79a05df69963b93b8/merged
total 0
drwx------. 1 c-nextcloud c-nextcloud 40 Sep 25 12:52 .
drwx------. 1 c-nextcloud c-nextcloud 62 Sep 25 12:52 ..
drwxr-xr-t. 1 c-nextcloud c-nextcloud  0 Sep 25 12:52 data
drwxr-xr-t. 1 c-nextcloud c-nextcloud  0 Sep 25 12:52 dev
drwxr-xr-x. 1 c-nextcloud c-nextcloud 48 Sep 25 12:52 etc
drwxr-xr-x. 1 c-nextcloud c-nextcloud  0 Sep 25 12:52 proc
drwxr-xr-x. 1 c-nextcloud c-nextcloud 40 Sep 25 12:52 run
drwxr-xr-x. 1 c-nextcloud c-nextcloud  0 Sep 25 12:52 sys

Notably, there is another directory side-by-side:

$ ls -la /var/home/c-nextcloud/.local/share/containers/storage/overlay/d3a5f0ffaecee9c32bf022e9ad0652ae314ff08e058f09d79a05df69963b93b8/merged.1/
total 0
drwx------. 1 c-nextcloud c-nextcloud  0 Sep 25 12:52 .
drwx------. 1 c-nextcloud c-nextcloud 62 Sep 25 12:52 ..

For some reason, it's not visible in a container that I have created:

$ podman run -it -v /var/home/c-nextcloud/.local/share/containers/storage/overlay/d3a5f0ffaecee9c32bf022e9ad0652ae314ff08e058f09d79a05df69963b93b8/merged/ busybox
/ # ls -la /var/home/c-nextcloud/.local/share/containers/storage/overlay/d3a5f0ffaecee9c32bf022e9ad0652ae314ff08e058f09d79a05df69963b93b8/merged/
total 0
drwxr-xr-x    1 root     root             0 Sep 25 12:42 .
drwxr-xr-t    3 root     root            36 Sep 25 12:42 ..

Even with SeLinux disabled:

$ podman run -it -v /var/home/c-nextcloud/.local/share/containers/storage/overlay/d3a5f0ffaecee9c32bf022e9ad0652ae314ff08e058f09d79a05df69963b93b8/merged/ --security-opt label=disable busybox
/ # ls -la /var/home/c-nextcloud/.local/share/containers/storage/overlay/d3a5f0ffaecee9c32bf022e9ad0652ae314ff08e058f09d79a05df69963b93b8/merged
total 0
drwxr-xr-x    1 root     root             0 Sep 25 12:46 .
drwxr-xr-t    3 root     root            36 Sep 25 12:46 ..

Generally, here is how the SeLinux label look like:

$ ls -laZ /var/home/c-nextcloud/.local/share/containers/storage/overlay/d3a5f0ffaecee9c32bf022e9ad0652ae314ff08e058f09d79a05df69963b93b8/merged
total 0
drwx------. 1 c-nextcloud c-nextcloud unconfined_u:object_r:data_home_t:s0 40 Sep 25 12:52 .
drwx------. 1 c-nextcloud c-nextcloud unconfined_u:object_r:data_home_t:s0 62 Sep 25 12:52 ..
drwxr-xr-t. 1 c-nextcloud c-nextcloud unconfined_u:object_r:data_home_t:s0  0 Sep 25 12:52 data
drwxr-xr-t. 1 c-nextcloud c-nextcloud unconfined_u:object_r:data_home_t:s0  0 Sep 25 12:52 dev
drwxr-xr-x. 1 c-nextcloud c-nextcloud unconfined_u:object_r:data_home_t:s0 48 Sep 25 12:52 etc
drwxr-xr-x. 1 c-nextcloud c-nextcloud unconfined_u:object_r:data_home_t:s0  0 Sep 25 12:52 proc
drwxr-xr-x. 1 c-nextcloud c-nextcloud unconfined_u:object_r:data_home_t:s0 40 Sep 25 12:52 run
drwxr-xr-x. 1 c-nextcloud c-nextcloud unconfined_u:object_r:data_home_t:s0  0 Sep 25 12:52 sys

Also, all folders except one are empty in there!

Here is the non-empty one and you can see all folders have 0 size and the files and directories are empty:

$ du -h /var/home/c-nextcloud/.local/share/containers/storage/overlay/d3a5f0ffaecee9c32bf022e9ad0652ae314ff08e058f09d79a05df69963b93b8/merged
0   /var/home/c-nextcloud/.local/share/containers/storage/overlay/d3a5f0ffaecee9c32bf022e9ad0652ae314ff08e058f09d79a05df69963b93b8/merged/sys
0   /var/home/c-nextcloud/.local/share/containers/storage/overlay/d3a5f0ffaecee9c32bf022e9ad0652ae314ff08e058f09d79a05df69963b93b8/merged/dev
0   /var/home/c-nextcloud/.local/share/containers/storage/overlay/d3a5f0ffaecee9c32bf022e9ad0652ae314ff08e058f09d79a05df69963b93b8/merged/data
0   /var/home/c-nextcloud/.local/share/containers/storage/overlay/d3a5f0ffaecee9c32bf022e9ad0652ae314ff08e058f09d79a05df69963b93b8/merged/proc
0   /var/home/c-nextcloud/.local/share/containers/storage/overlay/d3a5f0ffaecee9c32bf022e9ad0652ae314ff08e058f09d79a05df69963b93b8/merged/etc
0   /var/home/c-nextcloud/.local/share/containers/storage/overlay/d3a5f0ffaecee9c32bf022e9ad0652ae314ff08e058f09d79a05df69963b93b8/merged/run/secrets
0   /var/home/c-nextcloud/.local/share/containers/storage/overlay/d3a5f0ffaecee9c32bf022e9ad0652ae314ff08e058f09d79a05df69963b93b8/merged/run
0   /var/home/c-nextcloud/.local/share/containers/storage/overlay/d3a5f0ffaecee9c32bf022e9ad0652ae314ff08e058f09d79a05df69963b93b8/merged
$ ls -laZ /var/home/c-nextcloud/.local/share/containers/storage/overlay/d3a5f0ffaecee9c32bf022e9ad0652ae314ff08e058f09d79a05df69963b93b8/merged/run
total 0
drwxr-xr-x. 1 c-nextcloud c-nextcloud unconfined_u:object_r:data_home_t:s0 40 Sep 25 12:52 .
drwx------. 1 c-nextcloud c-nextcloud unconfined_u:object_r:data_home_t:s0 40 Sep 25 12:52 ..
-rwx------. 1 c-nextcloud c-nextcloud unconfined_u:object_r:data_home_t:s0  0 Sep 25 12:52 .containerenv
drwxr-xr-t. 1 c-nextcloud c-nextcloud unconfined_u:object_r:data_home_t:s0  0 Sep 25 12:52 secrets
$ cat /var/home/c-nextcloud/.local/share/containers/storage/overlay/d3a5f0ffaecee9c32bf022e9ad0652ae314ff08e058f09d79a05df69963b93b8/merged/run/.containerenv 

I am not sure what the actual issue is there. It could just delete these…?

It's exactly the same for all other affected containers.

rugk commented 1 month ago

This still happens, any news or other things I should try?