containers / podman

Podman: A tool for managing OCI containers and pods.
https://podman.io
Apache License 2.0
23.83k stars 2.42k forks source link

Run container in a system without syslog plus debug log level, the exit will time out #19075

Closed fangpenlin closed 1 year ago

fangpenlin commented 1 year ago

Issue Description

I ran into a problem, when you run the podman command in an environment where syslog is not available with a debug log level, the container exit is going to timeout. For example, in an Alpine docker container with podman package installed, then you ran this

podman --log-level=debug run -it alpine echo hi

Then you will see after hi shows, it takes 20 seconds until the timeout waiting for container error

DEBU[0000] Starting container 995e71cd3e7a94b39457b209bb4abba49f3903656ef48d3b895a4e814ce24cdf with command [echo hi]
DEBU[0000] Received a resize event: {Width:198 Height:50}
DEBU[0000] Started container 995e71cd3e7a94b39457b209bb4abba49f3903656ef48d3b895a4e814ce24cdf
DEBU[0000] Notify sent successfully
hi
DEBU[0020] Exceeded conmon timeout waiting for container 995e71cd3e7a94b39457b209bb4abba49f3903656ef48d3b895a4e814ce24cdf to exit
DEBU[0020] Called run.PersistentPostRunE(podman --log-level=debug run -it alpine echo hi)
DEBU[0020] Shutting down engines

After tracing into the code I now understand why. The first would be, when debug log level is provided, --syslog will be added to the conmon exit command arguments from here

https://github.com/containers/podman/blob/82af41cf2a48d879e14aa6eaee1ecf26e9a6e8af/pkg/specgenutil/util.go#L304-L306

Then syslog enabled or not is from here

https://github.com/containers/podman/blob/82af41cf2a48d879e14aa6eaee1ecf26e9a6e8af/libpod/oci_conmon_common.go#L1111

As you can see the part

logrus.IsLevelEnabled(logrus.DebugLevel)

However, when the command runs, it sees the syslog flag is set, it will try to use it.

https://github.com/containers/podman/blob/82af41cf2a48d879e14aa6eaee1ecf26e9a6e8af/cmd/podman/syslog_common.go#L20-L24

But then it will fail because syslog was not available on my Alpine environment. So it exits with code 1. And that's the cleanup command exits without even doing anything at all. As a result, the parent podman process will wait for the result updates set by cleanup but it will never happen.

Steps to reproduce the issue

Steps to reproduce the issue

docker run --privileged --entrypoint=/bin/sh -i alpine:3.18.2 \
   -c "apk add podman fuse fuse-overlayfs && \
    sed -e 's|^#mount_program|mount_program|g' /usr/share/containers/storage.conf > /etc/containers/storage.conf && \
    podman --log-level=debug run -i alpine:3.18.2 echo hi"

Describe the results you received

time="2023-06-30T18:53:23Z" level=info msg="Got Conmon PID as 88"
time="2023-06-30T18:53:23Z" level=debug msg="Created container 1286f6fb875b050a73ad7470dc9a8e05bb32f7c51508d1402ce59868fb849ae2 in OCI runtime"
time="2023-06-30T18:53:23Z" level=debug msg="Attaching to container 1286f6fb875b050a73ad7470dc9a8e05bb32f7c51508d1402ce59868fb849ae2"
time="2023-06-30T18:53:23Z" level=debug msg="Starting container 1286f6fb875b050a73ad7470dc9a8e05bb32f7c51508d1402ce59868fb849ae2 with command [echo hi]"
time="2023-06-30T18:53:23Z" level=debug msg="Started container 1286f6fb875b050a73ad7470dc9a8e05bb32f7c51508d1402ce59868fb849ae2"
time="2023-06-30T18:53:23Z" level=debug msg="Notify sent successfully"
hi
time="2023-06-30T18:53:43Z" level=debug msg="Exceeded conmon timeout waiting for container 1286f6fb875b050a73ad7470dc9a8e05bb32f7c51508d1402ce59868fb849ae2 to exit"
time="2023-06-30T18:53:43Z" level=debug msg="Called run.PersistentPostRunE(podman --log-level=debug run -i alpine:3.18.2 echo hi)"
time="2023-06-30T18:53:43Z" level=debug msg="Shutting down engines"

Describe the results you expected

The cleanup should finish without timeout error i.e, make clean up command not fail when syslog is not available. If syslog is necessary for running with debug log level (I saw somewhere it mentions some communications are done with syslog?), it should display an error instead of run the command and wait until timeout.

podman info output

host:
  arch: amd64
  buildahVersion: 1.31.0-dev
  cgroupControllers:
  - cpuset
  - cpu
  - cpuacct
  - blkio
  - memory
  - devices
  - freezer
  - net_cls
  - perf_event
  - net_prio
  - hugetlb
  - pids
  - rdma
  - misc
  cgroupManager: cgroupfs
  cgroupVersion: v1
  conmon:
    package: conmon-2.1.7-r1
    path: /usr/bin/conmon
    version: 'conmon version 2.1.7, commit: unknown'
  cpuUtilization:
    idlePercent: 99.29
    systemPercent: 0.29
    userPercent: 0.42
  cpus: 32
  databaseBackend: boltdb
  distribution:
    distribution: alpine
    version: 3.18.2
  eventLogger: file
  freeLocks: 2048
  hostname: d80c08b11442
  idMappings:
    gidmap: null
    uidmap: null
  kernel: 5.15.90.1-microsoft-standard-WSL2
  linkmode: dynamic
  logDriver: k8s-file
  memFree: 910123008
  memTotal: 16720412672
  networkBackend: netavark
  networkBackendInfo:
    backend: netavark
    dns:
      package: aardvark-dns-1.6.0-r0
      path: /usr/libexec/podman/aardvark-dns
      version: aardvark-dns 1.6.0
    package: netavark-1.6.0-r0
    path: /usr/libexec/podman/netavark
    version: netavark 1.6.0
  ociRuntime:
    name: crun
    package: crun-1.8.4-r0
    path: /usr/bin/crun
    version: |-
      crun version 1.8.4
      commit: 5a8fa99a5e41facba2eda4af12fa26313918805b
      rundir: /run/crun
      spec: 1.0.0
      +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +YAJL
  os: linux
  pasta:
    executable: ""
    package: ""
    version: ""
  remoteSocket:
    path: /run/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: false
    seccompEnabled: true
    seccompProfilePath: /etc/containers/seccomp.json
    selinuxEnabled: false
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.2.0-r0
    version: |-
      slirp4netns version 1.2.0
      commit: 656041d45cfca7a4176f6b7eed9e4fe6c11e8383
      libslirp: 4.7.0
      SLIRP_CONFIG_VERSION_MAX: 4
      libseccomp: 2.5.4
  swapFree: 3524632576
  swapTotal: 4294967296
  uptime: 48h 40m 56.00s (Approximately 2.00 days)
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  search:
  - docker.io
store:
  configFile: /etc/containers/storage.conf
  containerStore:
    number: 0
    paused: 0
    running: 0
    stopped: 0
  graphDriverName: overlay
  graphOptions:
    overlay.mount_program:
      Executable: /usr/bin/fuse-overlayfs
      Package: fuse-overlayfs-1.11-r1
      Version: |-
        fuse-overlayfs: version 1.11
        fusermount3 version: 3.14.1
        FUSE library version 3.14.1
        using FUSE kernel interface version 7.31
    overlay.mountopt: nodev
  graphRoot: /var/lib/containers/storage
  graphRootAllocated: 269427478528
  graphRootUsed: 69159690240
  graphStatus:
    Backing Filesystem: overlayfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 0
  runRoot: /run/containers/storage
  transientStore: false
  volumePath: /var/lib/containers/storage/volumes
version:
  APIVersion: 4.7.0-dev
  Built: 1688112376
  BuiltTime: Fri Jun 30 08:06:16 2023
  GitCommit: fca3c2ef8476669825564af89b008946bdf87599-dirty
  GoVersion: go1.20.5
  Os: linux
  OsArch: linux/amd64
  Version: 4.7.0-dev


### Podman in a container

Yes

### Privileged Or Rootless

None

### Upstream Latest Release

Yes

### Additional environment details

Additional environment details

### Additional information

Additional information like issue happens only occasionally or issue happens with a particular architecture or on a particular setting
fangpenlin commented 1 year ago

For people who are looking into workaround, such as running podman in alpine, an easy way is to simply install openrc or s6-overlay with a syslog running when the container starts, like this:

 FROM alpine
 RUN apk add s6-overlay s6-overlay-syslogd podman fuse fuse-overlayfs
 RUN sed -e 's|^#mount_program|mount_program|g' /usr/share/containers/storage.conf > /etc/containers/storage.conf
 ENTRYPOINT ["/init"]

This way, when you run with debug log level, syslog can be found and the exit won't timeout

docker run --privileged -i my-image podman --log-level=debug run -i alpine echo hi
Luap99 commented 1 year ago

What is the error message when you run something like: podman --syslog info. We should likely just ignore the error for the cleanup command.

fangpenlin commented 1 year ago
docker run --privileged --entrypoint=/bin/sh -i alpine:3.18.2 \
   -c "apk add podman fuse fuse-overlayfs && \
    sed -e 's|^#mount_program|mount_program|g' /usr/share/containers/storage.conf > /etc/containers/storage.conf && \
    podman --syslog info"

and the result:

Failed to initialize syslog hook: Unix syslog delivery error