Closed alecello closed 1 year ago
NOTE: After having performed some more tests it looks like the return code of conmon
is a separate issue, but the insta-kill problem is nonetheless real
Hi Alessandro,
Thanks for the details issue.
My 2c:
Since Quadlet uses podman kube play
to start K8S YAML files without parsing the file, it is not designed to know the name of the pod that will be created. As a result, it has to use podman kube down
to stop the service.
In addition, as you've shown the issue happens regardless to Quadlet.
Having said that, we should look into this issue at the podman kube
level
Thanks for reaching out, @alecello.
This issue has been fixed by commit 08b0d93ea35e. You can see in the tests that the service now transitions correctly to inactive
and not failed
anymore (https://github.com/containers/podman/commit/08b0d93ea35e59b388b7acf0bdc7464346a83c3a#diff-2a27b9858e6debd198c5d67a930d3dbe4ac2caa7d4bc2752daade3061bef17fcR462). We're close to releasing Podman 4.6 which will reach Fedora right after.
Issue Description
While playing around with Podman's Systemd generator (and quadlet) in a Fedora Server 38 VM I noticed that when running
systemctl stop <unit>
on the service generated from my.kube
file the unit enters the failed state due to the main process (the service container'sconmon
) exiting with code137
, which seems to suggest thatconmon
gotSIGKILL
'd. I then started playing with the barepodman kube play
/podman kube stop
commands (without the generator and/or systemd in the mix) and noticed that when running a service that takes some time to shut down after receiving the stop signal (in my tests I used themarctv/minecraft-papermc-server:latest
Minecraft server image from Docker Hub) I get two different behaviors depending on which command I use to stop the service:podman pod stop testpod
: pod takes some time to quit (around 4s on my machine), looking atpodman pod logs -f
in the meantime shows some container messages related to the server quitting (saving chunk data to disk et al)podman kube down test.yml
: pod quits instantly (the java process too: verified withwatch -n 0.1 "ps aux | grep java"
) and nothing is printed to neither the pod logs (which quits instantly as well) nor the system journalI went on to replicate the issue on my Arch Linux main machine (both environments used podman
4.5.1
) and sure enough the same behavior could be observedBefore opening this issue I tried to remove my custom
containers.conf
as well as creating a new one that just sets the default container stop timeout to some high value (tried both600
and6000
seconds). I also triedpodman system prune
andpodman system reset
to no avail. All tests have been run with SELinux in permissive mode (or no selinux at all for Arch) on an otherwise minimally configured system.I tried to craft a minimal example that triggers the issue on my end, here it is:
Steps to reproduce the issue
Steps to reproduce the issue
podman kube play <filename>
podman pod logs -f
and/or awatch -n 0.1
that includes the container process clearly visiblepodman pod stop <name>
podman pod start <name>
and await initializationpodman kube down <filename>
Describe the results you received
Pod gets terminated uncleanly despite the stop timeout being configured high
Describe the results you expected
Either one of two outcomes:
podman pod stop
podman kube down
in case this behavior of the command is intentional (if this is the case - I was not able to positively determine this from the man page - maybe a--soft
option tokube down
may be implemented and used by the generator?)podman info output
Podman in a container
No
Privileged Or Rootless
Privileged
Upstream Latest Release
Yes
Additional environment details
Default settings QEMU virtual machine with a single NAT virtual network interface run in privileged session
Additional information
None