Closed slcmpunk closed 2 years ago
The core issue is that systemd is clearing Podman's temporary files directory. We cannot be expected to continue functioning sanely after files were deleted; simultaneously, the directory we're using, /run/user/$UID
, is the only tmpfs guaranteed to be writable by non-root users (assuming it exists). We've evaluated other options, but at present this is the best way we have of doing things. I'm open to suggestions if folks have a better idea, but substantial thought has been put into this already, and this is the best solution we've found.
Describe the results you expected: No error, containers don't exit upon logout
this is not possible without using the lingering mode though. systemd will kill all the user processes and there is nothing Podman can do to prevent it
Since this is not an Issue Podman can deal with and we have given you a solution, I am closing. Please feel free to continue the discussion here.
For future travelers who stumble into this issue, this was the command I ran to enable "lingering mode" giuseppe
mentioned: loginctl enable-linger
.
It has also been added to the troubleshooting guide here.
Is this not the same issue as this?
It's really the inverse. In that issue, the Podman temporary files directory was not being cleared at all; in this issue, it's being cleared too much. Both are problematic to successful function.
Is it possible to add a tmpfile rule to handle this issue as well?
Potentially? Removing the Podman SHM locks (/dev/shm/libpod_rootless_lock_$UID
) for the user that logged out, when they log out, could work.
It might just be a peculiar coincidence, but I have only had this problem on a server which never had lingering enabled and was missing the /var/lib/systemd/linger directory. On a server that once had lingering enabled, then disabled (hence creating /var/lib/systemd/linger), but is otherwise identical to the first server, I never had this problem.
Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)
/kind bug
Description
Steps to reproduce the issue:
[root@rhel85 ~]# ssh test2@localhost test2@localhost's password:
Last login: Tue Feb 15 15:30:02 2022 from ::1
[test2@rhel85 ~]$ podman run -d registry.access.redhat.com/rhel7 sleep infinity 3f93bc489b026d7b707599072424017fcf56e4654524cd6b2def1cab853fb5f3
[root@rhel85 ~]# sleep 90 ; ll /run/user total 0 drwx------. 3 root root 80 Feb 15 09:40 0 drwx------. 8 test2 test2 180 Feb 15 15:37 1001 drwx------. 6 test3 test3 140 Feb 15 13:49 1002
[root@rhel85 ~]# ssh test2@localhost test2@localhost's password:
Last login: Tue Feb 15 15:35:19 2022 from ::1
[test2@rhel85 ~]$ podman --log-level=debug ps -a INFO[0000] podman filtering at log level debug
DEBU[0000] Called ps.PersistentPreRunE(podman --log-level=debug ps -a) DEBU[0000] Merged system config "/usr/share/containers/containers.conf" DEBU[0000] Using conmon: "/usr/bin/conmon"
DEBU[0000] Initializing boltdb state at /home/test2/.local/share/containers/storage/libpod/bolt_state.db DEBU[0000] Using graph driver overlay
DEBU[0000] Using graph root /home/test2/.local/share/containers/storage DEBU[0000] Using run root /run/user/1001/containers
DEBU[0000] Using static dir /home/test2/.local/share/containers/storage/libpod DEBU[0000] Using tmp dir /run/user/1001/libpod/tmp
DEBU[0000] Using volume path /home/test2/.local/share/containers/storage/volumes DEBU[0000] Set libpod namespace to ""
DEBU[0000] Not configuring container store
DEBU[0000] Initializing event backend file
DEBU[0000] configured OCI runtime runsc initialization failed: no valid executable found for OCI runtime runsc: invalid argument DEBU[0000] configured OCI runtime crun initialization failed: no valid executable found for OCI runtime crun: invalid argument DEBU[0000] configured OCI runtime kata initialization failed: no valid executable found for OCI runtime kata: invalid argument DEBU[0000] Using OCI runtime "/usr/bin/runc"
INFO[0000] Found CNI network podman (type=bridge) at /home/test2/.config/cni/net.d/87-podman.conflist DEBU[0000] Default CNI network name podman is unchangeable INFO[0000] podman filtering at log level debug
DEBU[0000] Called ps.PersistentPreRunE(podman --log-level=debug ps -a) DEBU[0000] cached value indicated that overlay is supported DEBU[0000] Merged system config "/usr/share/containers/containers.conf" DEBU[0000] cached value indicated that overlay is supported DEBU[0000] Using conmon: "/usr/bin/conmon"
DEBU[0000] Initializing boltdb state at /home/test2/.local/share/containers/storage/libpod/bolt_state.db DEBU[0000] Using graph driver overlay
DEBU[0000] Using graph root /home/test2/.local/share/containers/storage DEBU[0000] Using run root /run/user/1001/containers
DEBU[0000] Using static dir /home/test2/.local/share/containers/storage/libpod DEBU[0000] Using tmp dir /run/user/1001/libpod/tmp
DEBU[0000] Using volume path /home/test2/.local/share/containers/storage/volumes DEBU[0000] cached value indicated that overlay is supported DEBU[0000] Set libpod namespace to ""
DEBU[0000] [graphdriver] trying provided driver "overlay" DEBU[0000] cached value indicated that overlay is supported DEBU[0000] overlay test mount indicated that metacopy is not being used DEBU[0000] backingFs=xfs, projectQuotaSupported=false, useNativeDiff=true, usingMetacopy=false DEBU[0000] Initializing event backend file
DEBU[0000] configured OCI runtime crun initialization failed: no valid executable found for OCI runtime crun: invalid argument DEBU[0000] configured OCI runtime kata initialization failed: no valid executable found for OCI runtime kata: invalid argument DEBU[0000] configured OCI runtime runsc initialization failed: no valid executable found for OCI runtime runsc: invalid argument DEBU[0000] Using OCI runtime "/usr/bin/runc"
INFO[0000] Found CNI network podman (type=bridge) at /home/test2/.config/cni/net.d/87-podman.conflist DEBU[0000] Default CNI network name podman is unchangeable DEBU[0000] Podman detected system restart - performing state refresh ERRO[0000] Error refreshing container 3f93bc489b026d7b707599072424017fcf56e4654524cd6b2def1cab853fb5f3: error acquiring lock 0 for container 3f93bc489b026d7b707599072424017fcf56e4654524cd6b2def1cab853fb5f3: file exists ERRO[0000] Error refreshing volume c9cccab63362ff8438bbb38f5be7cdd1132a8bde72a69cb32db0bbf9db715f02: error acquiring lock 4 for volume c9cccab63362ff8438bbb38f5be7cdd1132a8bde72a69cb32db0bbf9db715f02: file exists INFO[0000] Setting parallel job count to 7
DEBU[0000] container 3f93bc489b026d7b707599072424017fcf56e4654524cd6b2def1cab853fb5f3 has no defined healthcheck CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 3f93bc489b02 registry.access.redhat.com/rhel7:latest sleep infinity About a minute ago Created compassionate_hofstadter DEBU[0000] Called ps.PersistentPostRunE(podman --log-level=debug ps -a)
Describe the results you received: Upon the first podman command ran, we get an error about acquiring locks
Describe the results you expected: No error, containers don't exit upon logout
Additional information you deem important (e.g. issue happens only occasionally): The workaround for this is to run 'loginctl enable-linger', but perhaps we need to reevaluate how podman performs these checks for reboot?
Error output only happens the first time you run a podman command, this is because podman determines a reboot has happened because the runtimeAliveFile is no longer present.
libpod/runtime.go
libpod/container_internal.go
Output of
podman version
:Output of
podman info --debug
:Package info (e.g. output of
rpm -q podman
orapt list podman
):Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide? (https://github.com/containers/podman/blob/main/troubleshooting.md)
No, but podman still performs same checks in the latest podman version, so this would still be an issue.
Additional environment details (AWS, VirtualBox, physical, etc.):