containers / podman

Podman: A tool for managing OCI containers and pods.
https://podman.io
Apache License 2.0
23.46k stars 2.38k forks source link

`podman container run` fails if `$LISTEN_FDS` is defined, even though `$LISTEN_PID` does not match podman's pid #20968

Open Arnavion opened 10 months ago

Arnavion commented 10 months ago

Issue Description

As the title says and repro steps show.

Steps to reproduce the issue

In a shell:

# This is guaranteed to not be podman's PID. 1 is not special here, any other value will reproduce the issue
# as long as it's not podman's PID.
export LISTEN_PID=1

# Succeeds and prints "5" as expected
podman container run -it --rm debian:12-slim echo 5

# Also succeeds and prints "5" as expected
LISTEN_FDS=0 podman container run -it --rm debian:12-slim echo 5

# Fails
LISTEN_FDS=1 podman container run -it --rm debian:12-slim echo 5

Describe the results you received

$ LISTEN_FDS=1 podman container run -it --rm debian:12-slim echo 5

ERRO[0000] Removing container 43118622f69036b89ddea7e86323559b7b97b8b2dba6f532d3edf7be34ff053e from OCI runtime: saving container 43118622f69036b89ddea7e86323559b7b97b8b2dba6f532d3edf7be34ff053e state: beginning container 43118622f69036b89ddea7e86323559b7b97b8b2dba6f532d3edf7be34ff053e save transaction: disk I/O error: bad file descriptor 
ERRO[0000] Unmounting container 43118622f69036b89ddea7e86323559b7b97b8b2dba6f532d3edf7be34ff053e storage: unmounting container 43118622f69036b89ddea7e86323559b7b97b8b2dba6f532d3edf7be34ff053e: saving container 43118622f69036b89ddea7e86323559b7b97b8b2dba6f532d3edf7be34ff053e state: beginning container 43118622f69036b89ddea7e86323559b7b97b8b2dba6f532d3edf7be34ff053e save transaction: disk I/O error: bad file descriptor 
ERRO[0000] Pruning container exit codes: beginning transaction to remove old timestamps: disk I/O error: bad file descriptor 
ERRO[0000] Cleaning up container 43118622f69036b89ddea7e86323559b7b97b8b2dba6f532d3edf7be34ff053e: removing container 43118622f69036b89ddea7e86323559b7b97b8b2dba6f532d3edf7be34ff053e network: saving container 43118622f69036b89ddea7e86323559b7b97b8b2dba6f532d3edf7be34ff053e state: beginning container 43118622f69036b89ddea7e86323559b7b97b8b2dba6f532d3edf7be34ff053e save transaction: disk I/O error: bad file descriptor 
Error: saving container 43118622f69036b89ddea7e86323559b7b97b8b2dba6f532d3edf7be34ff053e state: beginning container 43118622f69036b89ddea7e86323559b7b97b8b2dba6f532d3edf7be34ff053e save transaction: disk I/O error: bad file descriptor

Describe the results you expected

It should've succeeded and printed 5 just like the first two. LISTEN_PID is not podman's pid so it should not be affected by LISTEN_FDS being set.

podman info output

host:
  arch: amd64
  buildahVersion: 1.33.2
  cgroupControllers:
  - memory
  - pids
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.1.8-2.1.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.8, commit: unknown'
  cpuUtilization:
    idlePercent: 98.92
    systemPercent: 0.48
    userPercent: 0.6
  cpus: 32
  databaseBackend: sqlite
  distribution:
    distribution: opensuse-tumbleweed
    version: "20231207"
  eventLogger: journald
  freeLocks: 2048
  hostname: futaba
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
  kernel: 6.6.3-1-default
  linkmode: dynamic
  logDriver: journald
  memFree: 36155625472
  memTotal: 66481369088
  networkBackend: cni
  networkBackendInfo:
    backend: cni
    dns: {}
    package: |-
      cni-1.1.2-3.1.x86_64
      cni-plugins-1.3.0-2.1.x86_64
    path: /usr/libexec/cni
  ociRuntime:
    name: crun
    package: crun-1.12-1.1.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 1.12
      commit: ce429cb2e277d001c2179df1ac66a470f00802ae
      rundir: /run/user/1000/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +LIBKRUN +WASM:wasmedge +YAJL
  os: linux
  pasta:
    executable: ""
    package: ""
    version: ""
  remoteSocket:
    exists: false
    path: /run/user/1000/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /etc/containers/seccomp.json
    selinuxEnabled: false
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.2.2-1.1.x86_64
    version: |-
      slirp4netns version 1.2.2
      commit: 0ee2d87523e906518d34a6b423271e4826f71faf
      libslirp: 4.7.0
      SLIRP_CONFIG_VERSION_MAX: 5
      libseccomp: 2.5.4
  swapFree: 0
  swapTotal: 0
  uptime: 43h 31m 18.00s (Approximately 1.79 days)
  variant: ""
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  search:
  - registry.opensuse.org
  - registry.suse.com
  - docker.io
store:
  configFile: /home/arnavion/.config/containers/storage.conf
  containerStore:
    number: 0
    paused: 0
    running: 0
    stopped: 0
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /home/arnavion/.local/share/containers/storage
  graphRootAllocated: 999697944576
  graphRootUsed: 156854591488
  graphStatus:
    Backing Filesystem: xfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Supports shifting: "false"
    Supports volatile: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 2
  runRoot: /run/user/1000/containers
  transientStore: false
  volumePath: /home/arnavion/.local/share/containers/storage/volumes
version:
  APIVersion: 4.8.1
  Built: 1701820800
  BuiltTime: Tue Dec  5 16:00:00 2023
  GitCommit: ""
  GoVersion: go1.21.5
  Os: linux
  OsArch: linux/amd64
  Version: 4.8.1

Podman in a container

No

Privileged Or Rootless

Rootless

Upstream Latest Release

Yes

Additional environment details

I run tmux as a socket-activated systemd service, so all shells in my tmux windows inherit the tmux daemon's LISTEN_* env vars. Today I noticed podman container run fails when run in such tmux shells. Since it did still work fine in non-tmux shells, I diffed the env vars inside and outside tmux, noticed the LISTEN_* env vars difference, and replicated it with LISTEN_PID and LISTEN_FDS set even in a non-tmux shell.

This used to work fine ~1 month ago, at which time the OpenSUSE TW package was at v4.7.2, so this must have been introduced after that. I didn't verify v4.7.2 though, since I would have to build it myself to test. If you can't repro it I can make that effort.

Additional information

No response

Luap99 commented 10 months ago

This seems to depend on the database backend for some reason which would explain why it started with 4.8.

I think the main issue is that the code does not seem to validate that we actually got the LISTEN_FDS from the parent and we somehow end up closing the fd which generates the database errors.

But yes checking for LISTEN_PID seems reasonable but this is not something we ever did before, the main issue is that we may reexec ourself to create the userns as rootless in which case LISTEN_PID may not the current pid of the process so we would need to account for that. Another problem would be that testing would be more difficult as we would need to know the pid in advance.

Arnavion commented 10 months ago

Checking for $LISTEN_FDS without checking for $LISTEN_PID is never right. If you have something where you cannot rely on $LISTEN_PID being correct, then you shouldn't use $LISTEN_FDS either but come up with some other mechanism to convey that information.

That is, it sounds like you're saying process A spawns process B, process A might be socket-activated so $LISTEN_PID is A's pid, but it's B that actually does something with the socket-activated fd, so B acts on $LISTEN_FDS that it inherited from A and ignores $LISTEN_PID.

Instead, it should be something like: A notices that $LISTEN_PID matches its pid, so it execs B as B --listen-fds $LISTEN_FDS, and B ignores the inherited $LISTEN_FDS and only processes the --listen-fds CLI parameter.

github-actions[bot] commented 9 months ago

A friendly reminder that this issue had no activity for 30 days.