containers / toolbox

Tool for interactive command line environments on Linux
https://containertoolbx.org/
Apache License 2.0
2.57k stars 219 forks source link

Optionally make exec session terminate with parent #1204

Open ghost opened 1 year ago

ghost commented 1 year ago

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind feature

Description

Add a flag to podman exec to make exec session terminate with parent, similar to bubblewrap's bwrap --die-with-parent.

Immutable distributions make use of toolbox / distrobox to provide a mutable environment. A common use is to run commands directly within container (toolbox run [COMMAND] / distrobox enter -- [COMMAND]), since they use exec session, they have the same limitation of not terminating child-proceess when terminal emulator is closed.

Steps to reproduce the issue:

  1. Open System Monitor / Task Manager equivilent in your desktop environment, search for sleep

  2. Run the following command in your terminal emulator (either one will work):

podman:

podman run --rm -it \
    --name debian \
    --entrypoint /bin/sh \
    docker.io/library/debian:11

# In a new terminal emulator window
podman exec debian sleep 30

toolbox:

toolbox create
toolbox run sleep 30

distrobox:

distrobox create
distrobox enter -- sleep 30
  1. Then try to close terminal emulator, it'll prompt something like this:

image

  1. Insist closing it, then look at System Monitor

Describe the results you received:

sleep 30 still runs within container.

Describe the results you expected:

Nah, this is expected, hence this feature request.

Additional information you deem important (e.g. issue happens only occasionally):

Output of podman version:

Client:       Podman Engine
Version:      4.2.1
API Version:  4.2.1
Go Version:   go1.18.5
Built:        Thu Sep  8 03:58:19 2022
OS/Arch:      linux/amd64

Output of podman info:

Click me ``` host: arch: amd64 buildahVersion: 1.27.0 cgroupControllers: - cpu - io - memory - pids cgroupManager: systemd cgroupVersion: v2 conmon: package: conmon-2.1.4-3.fc36.x86_64 path: /usr/bin/conmon version: 'conmon version 2.1.4, commit: ' cpuUtilization: idlePercent: 60.53 systemPercent: 23.27 userPercent: 16.2 cpus: 4 distribution: distribution: fedora variant: silverblue version: "36" eventLogger: journald hostname: fedora idMappings: gidmap: - container_id: 0 host_id: 1000 size: 1 - container_id: 1 host_id: 100000 size: 65536 uidmap: - container_id: 0 host_id: 1000 size: 1 - container_id: 1 host_id: 100000 size: 65536 kernel: 6.0.5-200.fc36.x86_64 linkmode: dynamic logDriver: journald memFree: 224858112 memTotal: 16705081344 networkBackend: netavark ociRuntime: name: crun package: crun-1.6-2.fc36.x86_64 path: /usr/bin/crun version: |- crun version 1.6 commit: 18cf2efbb8feb2b2f20e316520e0fd0b6c41ef4d spec: 1.0.0 +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL os: linux remoteSocket: exists: true path: /run/user/1000/podman/podman.sock security: apparmorEnabled: false capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT rootless: true seccompEnabled: true seccompProfilePath: /usr/share/containers/seccomp.json selinuxEnabled: true serviceIsRemote: false slirp4netns: executable: /usr/bin/slirp4netns package: slirp4netns-1.2.0-0.2.beta.0.fc36.x86_64 version: |- slirp4netns version 1.2.0-beta.0 commit: 477db14a24ff1a3de3a705e51ca2c4c1fe3dda64 libslirp: 4.6.1 SLIRP_CONFIG_VERSION_MAX: 3 libseccomp: 2.5.3 swapFree: 28844679168 swapTotal: 34359734272 uptime: 42h 55m 22.00s (Approximately 1.75 days) plugins: authorization: null log: - k8s-file - none - passthrough - journald network: - bridge - macvlan volume: - local registries: search: - registry.fedoraproject.org - registry.access.redhat.com - docker.io - quay.io store: configFile: /var/home/user/.config/containers/storage.conf containerStore: number: 2 paused: 0 running: 1 stopped: 1 graphDriverName: overlay graphOptions: {} graphRoot: /var/home/user/.local/share/containers/storage graphRootAllocated: 510389125120 graphRootUsed: 201430126592 graphStatus: Backing Filesystem: btrfs Native Overlay Diff: "true" Supports d_type: "true" Using metacopy: "false" imageCopyTmpDir: /var/tmp imageStore: number: 104 runRoot: /run/user/1000/containers volumePath: /var/home/user/.local/share/containers/storage/volumes version: APIVersion: 4.2.1 Built: 1662580699 BuiltTime: Thu Sep 8 03:58:19 2022 GitCommit: "" GoVersion: go1.18.5 Os: linux OsArch: linux/amd64 Version: 4.2.1 ```

Package info (e.g. output of rpm -q podman or apt list podman or brew info podman):

podman-4.2.1-2.fc36.x86_64

Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide? (https://github.com/containers/podman/blob/main/troubleshooting.md)

Additional environment details (AWS, VirtualBox, physical, etc.):

Fedora Silverblue 36

mheon commented 1 year ago

It's more accurate to say that we can't make them not terminate when the container exits. The kernel enforces the rule that any PID namespace will kill every process in the namespace if PID 1 in the namespace dies; Podman will take down PID 1, guaranteeing that the kernel will unwind the rest of the namespace. For containers without a PID namespace, it's a bit trickier, but we do have an accurate list of processes in the container, which we then individually kill as part of stopping the container. In short, I strongly doubt your Podman reproducer actually does what you think it does; the kernel simply won't allow that to happen.

ghost commented 1 year ago

Sorry if i didn't explain it deeper. I'm not talking about the process being lingering when container is stopped, as this had never been the case, just as you said.

With toolbox / distrobox executing commands inside of container, the container is NOT stopped after the command is finished.

And the parent I'm talking isn't PID 1 of the container, but the podman exec process in terminal emulator. I guess a better term should be used here, since structurally podman exec process isn't a direct parent of container process.

This is a screenshot which represent the issue better:

image

If I try to close the terminal emulator, it'll prompt the following:

image

If I press "Close Terminal", sh, toolbox and podman (which runs exec command) will be terminated because they're child process of the vte session.

However, notice the conmon and its child process sleep aren't part of gnome-terminal-server. When sh is terminated, podman (exec command) will be terminated but the corresponding conmon process will be kept intact. As a result, sleep 30 isn't terminated properly.

And sleep 30 is only used for demonstration. In reality one could run something resource intensive, and then close the terminal emulator not knowing they're lingering in the background.

This is probably only an issue for pet container usecase. toolbox / distrobox tends to start a trap program inside container to keep it running. Anything interactive is executed by podman exec, hence this issue.

The feature request, to be precise, is to add an optional flag that make the conmon process terminates when the corresponding podman process is dead.

Luap99 commented 1 year ago

@mheon I think the request is basically to not double fork conmon and not let it create a new process group to keep it attached to the podman parent process.

ghost commented 1 year ago

Yes. This would work as well.

mheon commented 1 year ago

@Luap99 Don't know if that works. Conmon dying is only going to take out the first PID the exec session started; anything else it did, probably just reparents on top of PID 1 in the container. So we can definitely kill a single-process exec session, but a podman exec -ti $ctr bash like Toolbox does, we only get bash, not anything bash was doing (unless the shell automatically kills its children on exit, not something we can guarantee for every program).

We don't really have a robust way of tracking what processes were spawned from an exec session right now. We'd basically have to walk the process tree in the container, which seems potentially racy. On CGv2, a child cgroup might be a solution? Just need to make sure it doesn't interfere with the container itself being stopped...

rhatdan commented 1 year ago

I believe it walks the cgroup and kills all of the pids within the cgroup, or at least I remember this is what we wrote many years ago.

debarshiray commented 1 year ago

I wonder if Toolbx could detect this scenario and explicitly terminate the process that it had launched inside the container.

github-actions[bot] commented 1 year ago

A friendly reminder that this issue had no activity for 30 days.

rhatdan commented 1 year ago

This seems like more of an issue for toolbx rather then podman.

debarshiray commented 1 year ago

This seems like more of an issue for toolbx rather then podman.

Umm... it's not really clear to me what Toolbx could do here. Is there a recommended way to get to the process ID of the conmon process?

debarshiray commented 1 year ago

@Luap99 Don't know if that works. Conmon dying is only going to take out the first PID the exec session started;

I think it's good enough if conmon died and took out the first PID that the exec session started, because ...

anything else it did, probably just reparents on top of PID 1 in the container. So we can definitely kill a single-process exec session, but a podman exec -ti $ctr bash like Toolbox does, we only get bash, not anything bash was doing (unless the shell automatically kills its children on exit, not something we can guarantee for every program).

... if this was a shell directly running on the host without involving any containers, the expectation is that closing the terminal emulator takes out the shell and anything that's willing to die with it. If someone started a process in the background (say, sleep +Inf &), then it's OK if it keeps running in the background.

debarshiray commented 1 year ago

@giuseppe ameliorated one problematic outcome of this - the processes inside the exec sessions blocking shutdown. See https://github.com/containers/podman/pull/17025

However, it's still worth trying to ensure that the processes inside the exec session goes away as soon as the terminal emulator is closed, just as it happens when one is working directly on the host.

I have to say that I am a bit puzzled that the processes are outliving their controlling terminal. I know there's an inner nested terminal device for the container, but isn't it supposed to go away with the outer terminal?

debarshiray commented 1 year ago

This seems like more of an issue for toolbx rather then podman.

Umm... it's not really clear to me what Toolbx could do here. Is there a recommended way to get to the process ID of the conmon process?

@mheon @rhatdan @Luap99 @giuseppe Could one of you please help answer this question?

We are brainstorming various options at https://github.com/containers/toolbox/pull/1207 but it's not clear if it's possible for the podman exec caller to get the process ID of conmon(8) or the process inside the container.

Also, it's not clear to me why podman exec --interactive --tty should not terminate the foreground container process with it. Especially when podman exec -it is getting terminated by a SIGHUP from its controlling terminal.

giuseppe commented 1 year ago

I wonder if it will be easier for you to just use the OCI runtime to do the exec.

e.g. if you do crun exec you circumvent podman and conmon, I am fine to add something like --die-with-parent to crun in a similar way to what bwrap does.

Can you please play with it and see if "crun exec" does all you need?

Adding it to Podman/conmon will be much more complicated, we will need to change the way conmon works to not perform a double fork.

Luap99 commented 1 year ago

That said podman run is forwarding all signals (well the ones that can be caught) into the container so maybe should podman exec do that to.

ref https://github.com/containers/toolbox/issues/1400