Open ghost opened 1 year ago
It's more accurate to say that we can't make them not terminate when the container exits. The kernel enforces the rule that any PID namespace will kill every process in the namespace if PID 1 in the namespace dies; Podman will take down PID 1, guaranteeing that the kernel will unwind the rest of the namespace. For containers without a PID namespace, it's a bit trickier, but we do have an accurate list of processes in the container, which we then individually kill as part of stopping the container. In short, I strongly doubt your Podman reproducer actually does what you think it does; the kernel simply won't allow that to happen.
Sorry if i didn't explain it deeper. I'm not talking about the process being lingering when container is stopped, as this had never been the case, just as you said.
With toolbox
/ distrobox
executing commands inside of container, the container is NOT stopped after the command is finished.
And the parent I'm talking isn't PID 1 of the container, but the podman exec
process in terminal emulator. I guess a better term should be used here, since structurally podman exec
process isn't a direct parent of container process.
This is a screenshot which represent the issue better:
If I try to close the terminal emulator, it'll prompt the following:
If I press "Close Terminal", sh
, toolbox
and podman
(which runs exec
command) will be terminated because they're child process of the vte session.
However, notice the conmon
and its child process sleep
aren't part of gnome-terminal-server
. When sh
is terminated, podman
(exec
command) will be terminated but the corresponding conmon
process will be kept intact. As a result, sleep 30
isn't terminated properly.
And sleep 30
is only used for demonstration. In reality one could run something resource intensive, and then close the terminal emulator not knowing they're lingering in the background.
This is probably only an issue for pet container usecase. toolbox
/ distrobox
tends to start a trap program inside container to keep it running. Anything interactive is executed by podman exec
, hence this issue.
The feature request, to be precise, is to add an optional flag that make the conmon
process terminates when the corresponding podman
process is dead.
@mheon I think the request is basically to not double fork conmon and not let it create a new process group to keep it attached to the podman parent process.
Yes. This would work as well.
@Luap99 Don't know if that works. Conmon dying is only going to take out the first PID the exec session started; anything else it did, probably just reparents on top of PID 1 in the container. So we can definitely kill a single-process exec session, but a podman exec -ti $ctr bash
like Toolbox does, we only get bash, not anything bash was doing (unless the shell automatically kills its children on exit, not something we can guarantee for every program).
We don't really have a robust way of tracking what processes were spawned from an exec session right now. We'd basically have to walk the process tree in the container, which seems potentially racy. On CGv2, a child cgroup might be a solution? Just need to make sure it doesn't interfere with the container itself being stopped...
I believe it walks the cgroup and kills all of the pids within the cgroup, or at least I remember this is what we wrote many years ago.
I wonder if Toolbx could detect this scenario and explicitly terminate the process that it had launched inside the container.
A friendly reminder that this issue had no activity for 30 days.
This seems like more of an issue for toolbx rather then podman.
This seems like more of an issue for toolbx rather then podman.
Umm... it's not really clear to me what Toolbx could do here. Is there a recommended way to get to the process ID of the conmon process?
@Luap99 Don't know if that works. Conmon dying is only going to take out the first PID the exec session started;
I think it's good enough if conmon died and took out the first PID that the exec
session started, because ...
anything else it did, probably just reparents on top of PID 1 in the container. So we can definitely kill a single-process exec session, but a
podman exec -ti $ctr bash
like Toolbox does, we only get bash, not anything bash was doing (unless the shell automatically kills its children on exit, not something we can guarantee for every program).
... if this was a shell directly running on the host without involving any containers, the expectation is that closing the terminal emulator takes out the shell and anything that's willing to die with it. If someone started a process in the background (say, sleep +Inf &
), then it's OK if it keeps running in the background.
@giuseppe ameliorated one problematic outcome of this - the processes inside the exec
sessions blocking shutdown. See https://github.com/containers/podman/pull/17025
However, it's still worth trying to ensure that the processes inside the exec
session goes away as soon as the terminal emulator is closed, just as it happens when one is working directly on the host.
I have to say that I am a bit puzzled that the processes are outliving their controlling terminal. I know there's an inner nested terminal device for the container, but isn't it supposed to go away with the outer terminal?
This seems like more of an issue for toolbx rather then podman.
Umm... it's not really clear to me what Toolbx could do here. Is there a recommended way to get to the process ID of the conmon process?
@mheon @rhatdan @Luap99 @giuseppe Could one of you please help answer this question?
We are brainstorming various options at https://github.com/containers/toolbox/pull/1207 but it's not clear if it's possible for the podman exec
caller to get the process ID of conmon(8)
or the process inside the container.
Also, it's not clear to me why podman exec --interactive --tty
should not terminate the foreground container process with it. Especially when podman exec -it
is getting terminated by a SIGHUP
from its controlling terminal.
I wonder if it will be easier for you to just use the OCI runtime to do the exec.
e.g. if you do crun exec
you circumvent podman and conmon, I am fine to add something like --die-with-parent
to crun in a similar way to what bwrap does.
Can you please play with it and see if "crun exec" does all you need?
Adding it to Podman/conmon will be much more complicated, we will need to change the way conmon works to not perform a double fork.
That said podman run
is forwarding all signals (well the ones that can be caught) into the container so maybe should podman exec
do that to.
Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)
/kind feature
Description
Add a flag to
podman exec
to make exec session terminate with parent, similar to bubblewrap'sbwrap --die-with-parent
.Immutable distributions make use of
toolbox
/distrobox
to provide a mutable environment. A common use is to run commands directly within container (toolbox run [COMMAND]
/distrobox enter -- [COMMAND]
), since they use exec session, they have the same limitation of not terminating child-proceess when terminal emulator is closed.Steps to reproduce the issue:
Open System Monitor / Task Manager equivilent in your desktop environment, search for
sleep
Run the following command in your terminal emulator (either one will work):
podman:
toolbox:
distrobox:
Describe the results you received:
sleep 30
still runs within container.Describe the results you expected:
Nah, this is expected, hence this feature request.
Additional information you deem important (e.g. issue happens only occasionally):
Output of
podman version
:Output of
podman info
:Click me
``` host: arch: amd64 buildahVersion: 1.27.0 cgroupControllers: - cpu - io - memory - pids cgroupManager: systemd cgroupVersion: v2 conmon: package: conmon-2.1.4-3.fc36.x86_64 path: /usr/bin/conmon version: 'conmon version 2.1.4, commit: ' cpuUtilization: idlePercent: 60.53 systemPercent: 23.27 userPercent: 16.2 cpus: 4 distribution: distribution: fedora variant: silverblue version: "36" eventLogger: journald hostname: fedora idMappings: gidmap: - container_id: 0 host_id: 1000 size: 1 - container_id: 1 host_id: 100000 size: 65536 uidmap: - container_id: 0 host_id: 1000 size: 1 - container_id: 1 host_id: 100000 size: 65536 kernel: 6.0.5-200.fc36.x86_64 linkmode: dynamic logDriver: journald memFree: 224858112 memTotal: 16705081344 networkBackend: netavark ociRuntime: name: crun package: crun-1.6-2.fc36.x86_64 path: /usr/bin/crun version: |- crun version 1.6 commit: 18cf2efbb8feb2b2f20e316520e0fd0b6c41ef4d spec: 1.0.0 +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL os: linux remoteSocket: exists: true path: /run/user/1000/podman/podman.sock security: apparmorEnabled: false capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT rootless: true seccompEnabled: true seccompProfilePath: /usr/share/containers/seccomp.json selinuxEnabled: true serviceIsRemote: false slirp4netns: executable: /usr/bin/slirp4netns package: slirp4netns-1.2.0-0.2.beta.0.fc36.x86_64 version: |- slirp4netns version 1.2.0-beta.0 commit: 477db14a24ff1a3de3a705e51ca2c4c1fe3dda64 libslirp: 4.6.1 SLIRP_CONFIG_VERSION_MAX: 3 libseccomp: 2.5.3 swapFree: 28844679168 swapTotal: 34359734272 uptime: 42h 55m 22.00s (Approximately 1.75 days) plugins: authorization: null log: - k8s-file - none - passthrough - journald network: - bridge - macvlan volume: - local registries: search: - registry.fedoraproject.org - registry.access.redhat.com - docker.io - quay.io store: configFile: /var/home/user/.config/containers/storage.conf containerStore: number: 2 paused: 0 running: 1 stopped: 1 graphDriverName: overlay graphOptions: {} graphRoot: /var/home/user/.local/share/containers/storage graphRootAllocated: 510389125120 graphRootUsed: 201430126592 graphStatus: Backing Filesystem: btrfs Native Overlay Diff: "true" Supports d_type: "true" Using metacopy: "false" imageCopyTmpDir: /var/tmp imageStore: number: 104 runRoot: /run/user/1000/containers volumePath: /var/home/user/.local/share/containers/storage/volumes version: APIVersion: 4.2.1 Built: 1662580699 BuiltTime: Thu Sep 8 03:58:19 2022 GitCommit: "" GoVersion: go1.18.5 Os: linux OsArch: linux/amd64 Version: 4.2.1 ```Package info (e.g. output of
rpm -q podman
orapt list podman
orbrew info podman
):Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide? (https://github.com/containers/podman/blob/main/troubleshooting.md)
Additional environment details (AWS, VirtualBox, physical, etc.):
Fedora Silverblue 36