Processes started by podman exec may continue to run after podman exec terminates

hmkemppainen commented 1 year ago

Issue Description

Processes started by podman exec (without --detach) may be left running when the podman process is terminated.

I don't know if this is a bug or a feature, but it can make it very difficult for the parent process to correctly manage its children and this can lead to annoying failure modes.

Steps to reproduce the issue

Steps to reproduce the issue:

use podman run to start a container running netcat:

$ podman run --rm -i alpine nc -vnlkp 1234
use podman exec to start another netcat, connecting to the first one:

$ podman exec -l -i nc -vn 127.0.0.1 1234
verify that you have four related processes:

    $ pgrep -af 1234
    1542631 podman run --rm -i alpine nc -vnlkp 1234
    1542661 nc -vnlkp 1234
    1542664 podman exec -l -i nc -vn 127.0.0.1 1234
    1542691 nc -vn 127.0.0.1 1234

kill the podman exec process:

$ pkill -f podman\ exec
observe that podman exec is gone but the netcat is still running (and remains connected to the 1st netcat!)

    $ pgrep -af 1234
    1542631 podman run --rm -i alpine nc -vnlkp 1234
    1542661 nc -vnlkp 1234
    1542691 nc -vn 127.0.0.1 1234

Describe the results you received

The process started under podman exec is still running, even though the podman process itself is gone.

This behavior will confuse parent processes that aren't nearly as interested in the lifetime of the podman glue as they are in the actual process they are trying to run inside the container. The usual way of killing or checking whether the interesting process is still alive doesn't work. Also using setsid() after fork & before execing podman and killing the process group doesn't kill the child, because it lives in yet another session.

Describe the results you expected

The child process should die along with podman exec.

podman info output

host:
  arch: amd64
  buildahVersion: 1.30.0
  cgroupControllers:
  - cpu
  - memory
  - pids
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.1.7-2.fc38.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.7, commit: '
  cpuUtilization:
    idlePercent: 99.08
    systemPercent: 0.16
    userPercent: 0.76
  cpus: 24
  databaseBackend: boltdb
  distribution:
    distribution: fedora
    version: "38"
  eventLogger: journald
  hostname: fedora
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
  kernel: 6.3.8-200.fc38.x86_64
  linkmode: dynamic
  logDriver: journald
  memFree: 1579749376
  memTotal: 67328802816
  networkBackend: netavark
  ociRuntime:
    name: crun
    package: crun-1.8.5-1.fc38.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 1.8.5
      commit: b6f80f766c9a89eb7b1440c0a70ab287434b17ed
      rundir: /run/user/1000/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +LIBKRUN +WASM:wasmedge +YAJL
  os: linux
  remoteSocket:
    path: /run/user/1000/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.2.0-12.fc38.x86_64
    version: |-
      slirp4netns version 1.2.0
      commit: 656041d45cfca7a4176f6b7eed9e4fe6c11e8383
      libslirp: 4.7.0
      SLIRP_CONFIG_VERSION_MAX: 4
      libseccomp: 2.5.3
  swapFree: 8589668352
  swapTotal: 8589930496
  uptime: 856h 48m 38.00s (Approximately 35.67 days)
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  search:
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - docker.io
  - quay.io
store:
  configFile: /home/duclare/.config/containers/storage.conf
  containerStore:
    number: 31
    paused: 0
    running: 3
    stopped: 28
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /home/duclare/.local/share/containers/storage
  graphRootAllocated: 498387124224
  graphRootUsed: 185795895296
  graphStatus:
    Backing Filesystem: btrfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 10
  runRoot: /run/user/1000/containers
  transientStore: false
  volumePath: /home/duclare/.local/share/containers/storage/volumes
version:
  APIVersion: 4.5.1
  Built: 1685123928
  BuiltTime: Fri May 26 20:58:48 2023
  GitCommit: ""
  GoVersion: go1.20.4
  Os: linux
  OsArch: linux/amd64
  Version: 4.5.1

Podman in a container

No

Privileged Or Rootless

Rootless

Upstream Latest Release

No

Additional environment details

No response

Additional information

No response

vrothberg commented 1 year ago

Thanks for opening and the issue along with the reproducer, @hmkemppainen !

I am surprised this doesn't work but did not take a look at the code.

Cc: @mheon

mheon commented 1 year ago

The podman exec is kind of like podman run - it's just running the attach session associated with the exec'd process. The process in question has reparented on the container's PID1, it's associated with the container's cgroups, so once podman exec has successfully started the process in the container it's no longer associated at all with the frontend beyond the conmon process forwarding us the session's standard streams.

So this is basically what I'd expect. We do have additional primitives exposed via API for dealing with exec sessions, including the ability to kill running sessions, but we don't expose those via CLI, only the Docker-compat API.

github-actions[bot] commented 1 year ago

A friendly reminder that this issue had no activity for 30 days.

rhatdan commented 1 year ago

Does podman exec handle SIGTERM in this case? IE does it kill the exec session?

mheon commented 1 year ago

No, we don't sig-proxy for exec.

rhatdan commented 1 year ago

That is unexpected from the User, At least this user.

mheon commented 1 year ago

Would you expect that SIGTERM would kill the whole exec session or just PID1?

mheon commented 1 year ago

Well, the first PID of the exec session, not PID1

rhatdan commented 1 year ago

Yes I think this would be the Human expectation. If I am running podman exec -ti qm top and hit ^z I would expect that process to exit. It probably often does when the TTY Connection goes away, but forwarding the signal to process would help. I don't think we can guarantee that the processes exits, but well behaved ones should.

containers / podman