Closed jotelha closed 3 years ago
that happens because files on /tmp
are cleaned up by systemd-tmpfiles if they are older than a week (unless you've configured it differently).
Please make sure the run root is on /run/user
or on a path not handled by systemd-tmpfiles.
Thanks! That issue's source was easier identified than i expected. However, shouldn't something on this be on the https://github.com/containers/podman/blob/master/troubleshooting.md troubleshooting guide (or have podman warn about that behavior in case it falls back to the /tmp directory) ?
That behavior was particularly confusing, as we have several users on the machine running containers. Some never encountered that issue while others did, and with your hint we figured out that those who did never encounter the issue had their runroot set to /run/user/$UID
, while those who ran into the issue had it set to /tmp/run-$UID
, and those directories had been written hard into the $HOME/.config/containers/storage.conf
files without our interference. The discussion at https://github.com/containers/podman/issues/3274 helped my understanding.
I believe the reason for the intransparently distinct behavior for different users is that some users had been created without password an never logged in to directly. Instead some other user would become them via sudo su ...
, and as a consequence the /run/user/$UID
directory would never be created (see https://www.freedesktop.org/software/systemd/man/pam_systemd.html, item 1).
I am now trying to have podman use /run/user/$UID
for all users, which I am quite struggling with.
First, I begin with sudo loginctl enable-linger USERNAME
(https://www.freedesktop.org/software/systemd/man/loginctl.html#enable-linger%20USER%E2%80%A6) to have the /run/user/$UID' directory available reliably. Next, I would manually modify the
runrootentry within
.config/containers/storage.confto match that directory, e.g.
runroot = "/run/user/1003". Still, that won't make podman change it's mind about the
runroot`, and neither does
export XDG_RUNTIME_DIR=/run/user/$UID
which is referred to as a default according to https://github.com/containers/podman/blob/master/docs/tutorials/rootless_tutorial.md#storageconf. The podman info
output always points to /tmp/run-1003
, no matter what:
$ podman info
host:
BuildahVersion: 1.12.0-dev
CgroupVersion: v1
Conmon:
package: conmon-2.0.6-1.module_el8.2.0+305+5e198a41.x86_64
path: /usr/bin/conmon
version: 'conmon version 2.0.6, commit: a2b11288060ebd7abd20e0b4eb1a834bbf0aec3e'
Distribution:
distribution: '"centos"'
version: "8"
IDMappings:
gidmap:
- container_id: 0
host_id: 1003
size: 1
- container_id: 1
host_id: 296608
size: 65536
uidmap:
- container_id: 0
host_id: 1003
size: 1
- container_id: 1
host_id: 296608
size: 65536
MemFree: 2808565760
MemTotal: 8189861888
OCIRuntime:
name: runc
package: runc-1.0.0-65.rc10.module_el8.2.0+305+5e198a41.x86_64
path: /usr/bin/runc
version: 'runc version spec: 1.0.1-dev'
SwapFree: 3246157824
SwapTotal: 8497655808
arch: amd64
cpus: 2
eventlogger: journald
hostname: simdata.vm.uni-freiburg.de
kernel: 4.18.0-193.19.1.el8_2.x86_64
os: linux
rootless: true
slirp4netns:
Executable: /bin/slirp4netns
Package: slirp4netns-0.4.2-3.git21fdece.module_el8.2.0+305+5e198a41.x86_64
Version: |-
slirp4netns version 0.4.2+dev
commit: 21fdece2737dc24ffa3f01a341b8a6854f8b13b4
uptime: 566h 9m 11.03s (Approximately 23.58 days)
registries:
blocked: null
insecure: null
search:
- registry.access.redhat.com
- registry.redhat.io
- docker.io
store:
ConfigFile: /home/fireworks/.config/containers/storage.conf
ContainerStore:
number: 0
GraphDriverName: overlay
GraphOptions:
overlay.mount_program:
Executable: /bin/fuse-overlayfs
Package: fuse-overlayfs-0.7.2-5.module_el8.2.0+305+5e198a41.x86_64
Version: |-
fuse-overlayfs: version 0.7.2
FUSE library version 3.2.1
using FUSE kernel interface version 7.26
GraphRoot: /home/fireworks/.local/share/containers/storage
GraphStatus:
Backing Filesystem: xfs
Native Overlay Diff: "false"
Supports d_type: "true"
Using metacopy: "false"
ImageStore:
number: 0
RunRoot: /tmp/run-1003
VolumePath: /home/fireworks/.local/share/containers/storage/volumes
If explicitly specifying runroot
on the command line, there is the
$ podman --runroot /run/user/1003 info
Error: could not get runtime: database storage temporary directory (runroot) "/tmp/run-1003" does not match our storage
`temporary directory (runroot) "/run/user/1003": database configuration mismatch
I am at a loss. Some clear errors and warnings plus concise documentation on the (default) behavior would help a lot.
We'd be very grateful if you could spin up a PR with an addition to the Troubleshooting guide. Or if you'd rather, if you want to send along an e-mail with what should be in the guide for this issue, I can throw the PR together.
I might put that together. For the record: The last bit necessary was an rm ~/.local/share/containers/storage/libpod/bolt_state.db
to have podman accept the modified runroot
. Is that documented somewhere? Would that be the same behavior for the current version? That is something I cannot test.
I'll have to lean on @mheon about the boltdb doc and behavior. Thoughts Matt?
The Boltdb bit is expected - we won't let you swap storage paths for existing containers (bad things can happen if we swap directories mid-flight - files and directories we create can get lost, causing unexpected behavior. The only real way of migrating is to run podman system reset
and effectively wipe all existing state (we recognize this isn't very convenient, but this is the best we can do for now)
A friendly reminder that this issue had no activity for 30 days.
I am going to close this do to lack of movement, Reopen if you want to add documentation.
/kind bug
NOTE: This issue arises on a system we use in production, thus I am not in a position to arbitrarily upgrade to recent versions for testing. If this issue has been resolved elsewhere already, please just point to that fix and close this issue again. Thanks.
Description
From time to time, podman looses track of the running containers' states.
Amongst others, we use the following mongod pod https://github.com/IMTEK-Simulation/mongod-on-smb, but the issue arises independently of the actual container composition.
Steps to reproduce the issue:
build and launch pod on rootless podman (in the output sample below the above mentioned mongod services)
let it run for days and weeks
come back to check state, look at logs, enter interactive shell on running container, ...
Describe the results you received:
In the output below, the user name is
fireworks
.podman now lists the services within the mentioned pod as "created", not as "running",
and complains like this
if trying to enter the container or with
when trying to restart, or with
when trying to start. Latter error actually arises for occupied ports, as the mongo port has not been released
and in fact (all?) processes that are supposed to be running within the container are still alive
The pod can only be restarted after cleaning up manually, i.e.
and subsequent
Describe the results you expected:
Correct tracking of container states, i.e. containers still marked as running and accessible via
podman exec ...
.Additional information you deem important (e.g. issue happens only occasionally):
CentOS version:
The containers are started with podman-compose installed within a minimal venv
with a slightly adapted podman-compose that allows timeouts > 1 s when shutting down containers (otherwise equivalent with upstream podman-compose, see https://github.com/containers/podman-compose/compare/devel...jotelha:20200524_down_timeout).
This, however, is unlikely related to the issue.
Output of
podman version
:Output of
podman info --debug
:Package info (e.g. output of
rpm -q podman
orapt list podman
):Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide?
No and Yes.
The issue arises on a system we use in production, thus I am not in a position to arbitrarily upgrade to recent versions for testing. In addition, the issue only arises after days or weeks.
I did not find anything related on https://github.com/containers/podman/blob/master/troubleshooting.md.
Additional environment details (AWS, VirtualBox, physical, etc.):
In our setup, CentOS and podman run on a virtual machine provided by the University of Freiburg's Rechenzentrum.