Cannot use Kubernetes healthcheck probes without certain executables (tcpSocket/nc, httpGet/curl) inside the container

joelpurra commented 1 year ago

Issue Description

I'm moving some servers/services to podman kube play and ran in to a problem. Several (not all) servers died after a few minutes, seemingly consistent with configured probe limits, despite it being clear that the services were actually reachable and usable from clients. Disabling the health checks also meant that the service would stay up. After some digging I found the issue.

Container health check probes (startupProbe, readinessProbe, livenessProbe) with checks of kind tcpSocket or httpGet are effectively equivalent to exec checks. This is because they get converted to exec commands by podman in kube.go.

The exec conversion means executing nc to check for open TCP ports or curl to GET an HTTP URL, from inside the container. Containers which only have the bare minimum of software installed (as is best practice) may not have these "external dependencies", in which case the probes will always fail.

It is my understanding that both tcpSocket and httpGet should probe from within the pod, but not from within the particular container it probes. This places the nc/curl (or equivalent) dependency requirements on the pod manager.

Should these TCP/HTTP probe connection attempts be implemented in podman instead?

Idea: probe dependencies do not have to be direct dependencies of podman. Podman may use minimal "probe images", and delegate checks to ephemeral health check containers. This may increase flexibility and potentially allow for broader probe kind support.

Steps to reproduce the issue

Start a well-configured server/service in a pod using podman kube play, where at least one container has well-configured health checks of kinds tcpSocket or httpGet.
Monitor the pod to see if the container gets to the healthy state.
If it does not reach the healthy state, inspect if the container/image has nc/curl (with sufficient feature support) installed in the $PATH.

Describe the results you received

Health check results depend on not only on the containerized server/service itself, but also on other software included in the container/image.

Describe the results you expected

I was under the impression that "outside" health checks, such as tcpSocket and httpGet, should not rely on health check software (which is not usually a part of the actual server/service software) within the container itself.

podman info output

host:
  arch: amd64
  buildahVersion: 1.30.0
  cgroupControllers:
  - cpu
  - memory
  - pids
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon_2:2.1.7-0debian12+obs15.22_amd64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.7, commit: '
  cpuUtilization:
    idlePercent: 97.66
    systemPercent: 0.92
    userPercent: 1.43
  cpus: 1
  databaseBackend: boltdb
  distribution:
    codename: bookworm
    distribution: debian
    version: "12"
  eventLogger: journald
  hostname: server
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
  kernel: 6.1.0-7-amd64
  linkmode: dynamic
  logDriver: journald
  memFree: 97869824
  memTotal: 1004994560
  networkBackend: netavark
  ociRuntime:
    name: crun
    package: crun_101:1.8.4-0debian12+obs55.7_amd64
    path: /usr/bin/crun
    version: |-
      crun version 1.8.4
      commit: 5a8fa99a5e41facba2eda4af12fa26313918805b
      rundir: /run/user/1000/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +YAJL
  os: linux
  remoteSocket:
    path: /run/user/1000/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: false
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns_1.2.0-1_amd64
    version: |-
      slirp4netns version 1.2.0
      commit: 656041d45cfca7a4176f6b7eed9e4fe6c11e8383
      libslirp: 4.7.0
      SLIRP_CONFIG_VERSION_MAX: 4
      libseccomp: 2.5.4
  swapFree: 3581431808
  swapTotal: 3779063808
  uptime: 116h 22m 6.00s (Approximately 4.83 days)
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  search:
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - docker.io
  - quay.io
store:
  configFile: /home/username/.config/containers/storage.conf
  containerStore:
    number: 0
    paused: 0
    running: 0
    stopped: 0
  graphDriverName: btrfs
  graphOptions: {}
  graphRoot: /home/username/.local/share/containers/storage
  graphRootAllocated: 31138512896
  graphRootUsed: 6853885952
  graphStatus:
    Build Version: Btrfs v6.2
    Library Version: "102"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 0
  runRoot: /run/user/1000/containers
  transientStore: false
  volumePath: /home/username/.local/share/containers/storage/volumes
version:
  APIVersion: 4.5.0
  Built: 0
  BuiltTime: Thu Jan  1 00:00:00 1970
  GitCommit: ""
  GoVersion: go1.19.8
  Os: linux
  OsArch: linux/amd64
  Version: 4.5.0

Podman in a container

No

Privileged Or Rootless

Rootless

Upstream Latest Release

Yes

Additional environment details

Tested on:

Debian Testing (bookworm), running in a Proxmox VPS.
Ubuntu 22.10, on bare metal.

Additional information

Test cases

The Kubernetes documentation provides probe examples (referenced in kube.go) which can be executed directly with podman kube play. While monitoring podman container statuses, kube play each yaml file for at least a minute before taking it --down.

exec-liveness.yaml

The exec probe works as expected, entering the healthy state immediately and later restarting when the health check deliberately fails. Failure command output cat: can't open '/tmp/healthy': No such file or directory.

podman kube play 'https://k8s.io/examples/pods/probe/exec-liveness.yaml'
podman kube play --down 'https://k8s.io/examples/pods/probe/exec-liveness.yaml'

tcp-liveness-readiness.yaml

The tcpSocket probe never leaves the starting state, and gets restarted after several failures. There is no command output.

podman kube play 'https://k8s.io/examples/pods/probe/tcp-liveness-readiness.yaml'
podman kube play --down 'https://k8s.io/examples/pods/probe/tcp-liveness-readiness.yaml'

http-liveness.yaml

The httpGet probe never leaves the starting state, and gets restarted after several failures. There is no command output.

podman kube play 'https://k8s.io/examples/pods/probe/http-liveness.yaml'
podman kube play --down 'https://k8s.io/examples/pods/probe/http-liveness.yaml'

grpc-liveness.yaml

The grpc probe is not supported by podman, but is included here for completeness with the health check examples from Kubernetes.io.

grpc

It seems the `grpc` probe is ignored, and the container keeps running without a health state (`starting`, `healthy`, `unhealthy`, ...) in the `podman ps` output. This may be used as an example of additional "outside" health check kinds, which may be separately containerized without imposing these dependencies on the `podman` binary itself. See [gRPC health checks](https://github.com/grpc/grpc/blob/master/doc/health-checking.md). ```shell podman kube play 'https://k8s.io/examples/pods/probe/grpc-liveness.yaml' podman kube play --down 'https://k8s.io/examples/pods/probe/grpc-liveness.yaml' ```

Test monitoring

Monitor the container states separately, for example either by watching podman ps "live" or by logging the podman inspect output.

# NOTE: watch status live.
watch --differences --interval 1 podman ps

# NOTE: keep a status log.
( while true; do date ; podman inspect --latest | jq '.[] | { Name, Health: .State.Health }' ; sleep 5 ; done ; )

Workarounds

Install nc/curl directly in the container in an extra build step.
Find a different (but equivalent) health check method which may exec directly in the container. One example may be to test for sockets created when the container/server has initialized fully: test -S /path/to/server/socket.
Utilize existing container software for workarounds, perhaps script interpreters such as perl or python.

Here's an example of using bash redirections to simulate a nc -z check on localhost:8080 (TCP). Note that this workaround will send (empty) data to the server port, which may cause side-effects if the server acts on the incoming connection.

On failure the output is bash: connect: Connection refused\nbash: line 1: /dev/tcp/localhost/8080: Connection refused and exit code is non-zero.

livenessProbe:
  # TODO: replace with tcpSocket healthcheck.
  exec:
    command:
      - bash
      - "-c"
      - ": > /dev/tcp/localhost/8080"
  failureThreshold: 3
  initialDelaySeconds: 1
  periodSeconds: 5

Executing nc in common base images

The same issue arises for "simplified" command versions, such as [`nc` in `busybox`](https://boxmatrix.info/wiki/Property:nc) which doesn't always support the `-z` nor `-v` options/features (depending on compile flags and `busybox` version). ```shell podman run --rm busybox nc ``` ```text BusyBox v1.22.1 (2014-05-22 23:22:11 UTC) multi-call binary. Usage: nc [-iN] [-wN] [-l] [-p PORT] [-f FILE|IPADDR PORT] [-e PROG] Open a pipe to IP:PORT or FILE -l Listen mode, for inbound connects (use -ll with -e for persistent server) -p PORT Local port -w SEC Connect timeout -i SEC Delay interval for lines sent -f FILE Use file (ala /dev/ttyS0) instead of network -e PROG Run PROG after connect ``` ```shell podman run --rm alpine nc ``` ```text BusyBox v1.35.0 (2022-11-19 10:13:10 UTC) multi-call binary. Usage: nc [OPTIONS] HOST PORT - connect nc [OPTIONS] -l -p PORT [HOST] [PORT] - listen -e PROG Run PROG after connect (must be last) -l Listen mode, for inbound connects -lk With -e, provides persistent server -p PORT Local port -s ADDR Local address -w SEC Timeout for connects and final net reads -i SEC Delay interval for lines sent -n Don't do DNS resolution -u UDP mode -b Allow broadcasts -v Verbose -o FILE Hex dump traffic -z Zero-I/O mode (scanning) ``` ```shell podman run --rm centos nc ``` Could not find `nc` in `$PATH`. ```shell podman run --rm fedora nc ``` Could not find `nc` in `$PATH`. ```shell podman run --rm debian nc ``` Could not find `nc` in `$PATH`. ```shell podman run --rm ubuntu nc ``` Could not find `nc` in `$PATH`.

Executing curl in common base images

It's less common to find `curl` installed. ```shell podman run --rm busybox curl ``` Could not find `curl` in `$PATH`. ```shell podman run --rm alpine curl ``` Could not find `curl` in `$PATH`. ```shell podman run --rm centos curl ``` ```text curl: try 'curl --help' or 'curl --manual' for more information ``` ```shell podman run --rm fedora curl ``` ```text curl: try 'curl --help' or 'curl --manual' for more information ``` ```shell podman run --rm debian curl ``` Could not find `curl` in `$PATH`. ```shell podman run --rm ubuntu curl ``` Could not find `curl` in `$PATH`.

Running a personal Open Build Service (OBS) branch of podman v4.5.0 (as suggested in another issue), with a build dependency fix and added BTRFS support. I'm just starting out using OBS, but it should not affect this issue.

apt show podman

```text Package: podman Version: 4:4.5.0-debian12joelpurra1+obs82.1 Priority: optional Maintainer: Podman Debbuild Maintainers Installed-Size: 73.2 MB Provides: podman-manpages (= 4:4.5.0-debian12joelpurra1+obs82.1) Depends: catatonit,iptables,nftables,conmon (>= 2:2.0.30),containers-common (>= 4:1),uidmap,netavark (>= 1.0.3-1),libc6,libgpg-error0 Recommends: podman-gvproxy (= 4:4.5.0-debian12joelpurra1+obs82.1) Suggests: qemu-user-static Homepage: https://podman.io/ Download-Size: 29.3 MB APT-Manual-Installed: yes APT-Sources: https://download.opensuse.org/repositories/home:/joelpurra:/branches:/devel:/kubic:/libcontainers:/unstable/Debian_Testing Packages Description: Manage Pods, Containers and Container Images podman (Pod Manager) is a fully featured container engine that is a simple daemonless tool. podman provides a Docker-CLI comparable command line that eases the transition from other container engines and allows the management of pods, containers and images. Simply put: alias docker=podman. Most podman commands can be run as a regular user, without requiring additional privileges. . podman uses Buildah(1) internally to create container images. Both tools share image (not container) storage, hence each can use or manipulate images (but not containers) created by the other. . Manage Pods, Containers and Container Images podman Simple management tool for pods, containers and images N: There are 2 additional records. Please use the '-a' switch to see them. ```

github-actions[bot] commented 1 year ago

A friendly reminder that this issue had no activity for 30 days.

rhatdan commented 1 year ago

@umohnani8 @mheon @Luap99 @vrothberg @ygalblum WDYT?

ygalblum commented 1 year ago

Not sure how good my idea is, but what about using nsenter -t <ContainerPID> -n -U curl/nc instead of execing into the pod? This way we only enter the network and user namespace of the pod while keeping the access to the executable on the host.

github-actions[bot] commented 1 year ago

A friendly reminder that this issue had no activity for 30 days.

Luap99 commented 1 year ago

Not sure how good my idea is, but what about using nsenter -t <ContainerPID> -n -U curl/nc instead of execing into the pod? This way we only enter the network and user namespace of the pod while keeping the access to the executable on the host.

In general that should be better but it still requires these deps to be installed on the host, curl and nc are definitely not installed by default on all distros so that might even cause regressions.

I think if we fix this we might as well do it properly and do not depend on external commands. Checking for a tcp port and doing a http get request can be done trivially in go. The only question would be how do we expose this in our internal healthcheck logic.

rhatdan commented 1 year ago

I like the idea of building some of these into the podman and not relying on external tools. Exposing is the issue.

github-actions[bot] commented 1 year ago

A friendly reminder that this issue had no activity for 30 days.

nogweii commented 11 months ago

Has there been any further thought about this? I'd also like to see this functionality brought to podman run & Quadlet, so that I could easily define an HTTP health check for a container running as a systemd service. With this and #18189 together, that would be a powerful combination.

Something like podman run --health-http-probe and --health-tcp-probe? Though, that raises a new question: what should podman do when multiple probes are configured?

viplifes commented 6 months ago

In alpine linux curl is not installed by default. As an alternative to curl, you can use wget https://github.com/containers/podman/blob/v4.9.3/pkg/specgen/generate/kube/kube.go#L679

curl -

commandString = fmt.Sprintf("curl -f %s://%s:%d%s || %s", uriScheme, host, portNum, path, failureCmd)

wget -

commandString = fmt.Sprintf("wget -q -O /dev/null %s://%s:%d%s || %s", uriScheme, host, portNum, path, failureCmd)

containers / podman