containers / podman

Podman: A tool for managing OCI containers and pods.
https://podman.io
Apache License 2.0
23.54k stars 2.39k forks source link

Podman run with --init traps SIGTSTP (20) #18095

Closed tophercullen closed 1 year ago

tophercullen commented 1 year ago

Issue Description

Part of our internal tests are around signal handling, having the service code send its own PID various signals. Example signals include TSTP (20) and CONT (18). When running with docker, all tests succeed. When running with podman, one of these signal tests mysteriously fails. I've root caused this to the --init flag, and/or related binary.

Running without --init, both the TSTP and CONT signal are received by the service code.

Running with --init, the CONT signal is received by the service code, but the `TSTP1 signal is not. There may be other signals but those are the only two I've personally debugged.

Steps to reproduce the issue

Steps to reproduce the issue

  1. Start a simple python signal trapper:
    podman run -it \
    -m 2G --memory-swap=2G \
    --rm \
    "docker.io/library/python:3.9.15-slim-bullseye" \
    python -c "exec('import signal\nimport time\n\ndef SignalHandler_SIGALL(SignalNumber,Frame):\n     print(\"SignalHandler of {} {}\".format(SignalNumber,Frame))\n\nsignal.signal(signal.SIGTSTP,SignalHandler_SIGALL)\nsignal.signal(signal.SIGCONT,SignalHandler_SIGALL)\n\nwhile 1:\n    time.sleep(1)')"
  2. Open another shell
  3. podman container list to get the container id
  4. exec into the container podman exec -it <contianer id> /bin/bash
  5. From inside the container, test the TSTP signal kill -s TSTP 2
  6. From inside the container, test the CONT signal kill -s CONT 2
  7. Optional: repeat steps 1-6 with --init removed from step 1 and the PID in steps 5 and 6 changed to 1

Describe the results you received

Only the CONT signal is received when running with --init

SignalHandler of 18....

Describe the results you expected

Both signals are received when running with --init

SignalHandler of 20....
SignalHandler of 18....

podman info output

host:
  arch: amd64
  buildahVersion: 1.29.0
  cgroupControllers:
  - cpu
  - io
  - memory
  - pids
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.1.7-2.fc37.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.7, commit: '
  cpuUtilization:
    idlePercent: 82.46
    systemPercent: 2.15
    userPercent: 15.4
  cpus: 12
  distribution:
    distribution: fedora
    variant: workstation
    version: "37"
  eventLogger: journald
  hostname: p53
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
  kernel: 6.2.9-200.fc37.x86_64
  linkmode: dynamic
  logDriver: k8s-file
  memFree: 31223541760
  memTotal: 33444790272
  networkBackend: netavark
  ociRuntime:
    name: crun
    package: crun-1.8.3-2.fc37.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 1.8.3
      commit: 59f2beb7efb0d35611d5818fd0311883676f6f7e
      rundir: /run/user/1000/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +LIBKRUN +WASM:wasmedge +YAJL
  os: linux
  remoteSocket:
    exists: true
    path: /run/user/1000/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: false
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.2.0-8.fc37.x86_64
    version: |-
      slirp4netns version 1.2.0
      commit: 656041d45cfca7a4176f6b7eed9e4fe6c11e8383
      libslirp: 4.7.0
      SLIRP_CONFIG_VERSION_MAX: 4
      libseccomp: 2.5.3
  swapFree: 25325199360
  swapTotal: 25325199360
  uptime: 0h 3m 26.00s
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  volume:
  - local
registries:
  search:
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - docker.io
  - quay.io
store:
  configFile: /home/tcullen/.config/containers/storage.conf
  containerStore:
    number: 0
    paused: 0
    running: 0
    stopped: 0
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /home/tcullen/.local/share/containers/storage
  graphRootAllocated: 410792558592
  graphRootUsed: 86001356800
  graphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 539
  runRoot: /run/user/1000/containers
  transientStore: false
  volumePath: /home/tcullen/.local/share/containers/storage/volumes
version:
  APIVersion: 4.4.4
  Built: 1680521485
  BuiltTime: Mon Apr  3 05:31:25 2023
  GitCommit: ""
  GoVersion: go1.19.7
  Os: linux
  OsArch: linux/amd64
  Version: 4.4.4

Podman in a container

No

Privileged Or Rootless

Rootless

Upstream Latest Release

Yes

Additional environment details

Additional environment details

Additional information

Additional information like issue happens only occasionally or issue happens with a particular architecture or on a particular setting

giuseppe commented 1 year ago

this is done on purpose by catatonit: https://github.com/openSUSE/catatonit/blob/main/catatonit.c#L272-L280

I've verified that if I drop these lines I get the behavior you've described. Whether this is better or worse is up for debate, I personally have no opinion on both behaviors.

I am closing the issue since there is nothing we can do from the Podman side, would you mind opening an issue for catatonit to start the discussion?