Open joelpurra opened 1 year ago
A friendly reminder that this issue had no activity for 30 days.
@umohnani8 @mheon @Luap99 @vrothberg @ygalblum WDYT?
Not sure how good my idea is, but what about using nsenter -t <ContainerPID> -n -U curl/nc
instead of exec
ing into the pod? This way we only enter the network and user namespace of the pod while keeping the access to the executable on the host.
A friendly reminder that this issue had no activity for 30 days.
Not sure how good my idea is, but what about using
nsenter -t <ContainerPID> -n -U curl/nc
instead ofexec
ing into the pod? This way we only enter the network and user namespace of the pod while keeping the access to the executable on the host.
In general that should be better but it still requires these deps to be installed on the host, curl and nc are definitely not installed by default on all distros so that might even cause regressions.
I think if we fix this we might as well do it properly and do not depend on external commands. Checking for a tcp port and doing a http get request can be done trivially in go. The only question would be how do we expose this in our internal healthcheck logic.
I like the idea of building some of these into the podman and not relying on external tools. Exposing is the issue.
A friendly reminder that this issue had no activity for 30 days.
Has there been any further thought about this? I'd also like to see this functionality brought to podman run
& Quadlet, so that I could easily define an HTTP health check for a container running as a systemd service. With this and #18189 together, that would be a powerful combination.
Something like podman run --health-http-probe
and --health-tcp-probe
? Though, that raises a new question: what should podman do when multiple probes are configured?
In alpine linux curl is not installed by default. As an alternative to curl, you can use wget https://github.com/containers/podman/blob/v4.9.3/pkg/specgen/generate/kube/kube.go#L679
curl -
commandString = fmt.Sprintf("curl -f %s://%s:%d%s || %s", uriScheme, host, portNum, path, failureCmd)
wget -
commandString = fmt.Sprintf("wget -q -O /dev/null %s://%s:%d%s || %s", uriScheme, host, portNum, path, failureCmd)
Issue Description
I'm moving some servers/services to
podman kube play
and ran in to a problem. Several (not all) servers died after a few minutes, seemingly consistent with configured probe limits, despite it being clear that the services were actually reachable and usable from clients. Disabling the health checks also meant that the service would stay up. After some digging I found the issue.Container health check probes (
startupProbe
,readinessProbe
,livenessProbe
) with checks of kindtcpSocket
orhttpGet
are effectively equivalent toexec
checks. This is because they get converted toexec
commands bypodman
inkube.go
.The
exec
conversion means executingnc
to check for open TCP ports orcurl
toGET
an HTTP URL, from inside the container. Containers which only have the bare minimum of software installed (as is best practice) may not have these "external dependencies", in which case the probes will always fail.It is my understanding that both
tcpSocket
andhttpGet
should probe from within the pod, but not from within the particular container it probes. This places thenc
/curl
(or equivalent) dependency requirements on the pod manager.Should these TCP/HTTP probe connection attempts be implemented in
podman
instead?Idea: probe dependencies do not have to be direct dependencies of
podman
. Podman may use minimal "probe images", and delegate checks to ephemeral health check containers. This may increase flexibility and potentially allow for broader probe kind support.Steps to reproduce the issue
podman kube play
, where at least one container has well-configured health checks of kindstcpSocket
orhttpGet
.healthy
state.healthy
state, inspect if the container/image hasnc
/curl
(with sufficient feature support) installed in the$PATH
.Describe the results you received
Health check results depend on not only on the containerized server/service itself, but also on other software included in the container/image.
Describe the results you expected
I was under the impression that "outside" health checks, such as
tcpSocket
andhttpGet
, should not rely on health check software (which is not usually a part of the actual server/service software) within the container itself.podman info output
Podman in a container
No
Privileged Or Rootless
Rootless
Upstream Latest Release
Yes
Additional environment details
Tested on:
Additional information
Test cases
The Kubernetes documentation provides probe examples (referenced in
kube.go
) which can be executed directly withpodman kube play
. While monitoringpodman
container statuses,kube play
each yaml file for at least a minute before taking it--down
.exec-liveness.yaml
The
exec
probe works as expected, entering thehealthy
state immediately and later restarting when the health check deliberately fails. Failure command outputcat: can't open '/tmp/healthy': No such file or directory
.tcp-liveness-readiness.yaml
The
tcpSocket
probe never leaves thestarting
state, and gets restarted after several failures. There is no command output.http-liveness.yaml
The
httpGet
probe never leaves thestarting
state, and gets restarted after several failures. There is no command output.grpc-liveness.yaml
The
grpc
probe is not supported bypodman
, but is included here for completeness with the health check examples from Kubernetes.io.grpc
It seems the `grpc` probe is ignored, and the container keeps running without a health state (`starting`, `healthy`, `unhealthy`, ...) in the `podman ps` output. This may be used as an example of additional "outside" health check kinds, which may be separately containerized without imposing these dependencies on the `podman` binary itself. See [gRPC health checks](https://github.com/grpc/grpc/blob/master/doc/health-checking.md). ```shell podman kube play 'https://k8s.io/examples/pods/probe/grpc-liveness.yaml' podman kube play --down 'https://k8s.io/examples/pods/probe/grpc-liveness.yaml' ```Test monitoring
Monitor the container states separately, for example either by watching
podman ps
"live" or by logging thepodman inspect
output.Workarounds
nc
/curl
directly in the container in an extra build step.exec
directly in the container. One example may be to test for sockets created when the container/server has initialized fully:test -S /path/to/server/socket
.perl
orpython
.Here's an example of using
bash
redirections to simulate anc -z
check onlocalhost:8080
(TCP). Note that this workaround will send (empty) data to the server port, which may cause side-effects if the server acts on the incoming connection.On failure the output is
bash: connect: Connection refused\nbash: line 1: /dev/tcp/localhost/8080: Connection refused
and exit code is non-zero.Executing nc in common base images
The same issue arises for "simplified" command versions, such as [`nc` in `busybox`](https://boxmatrix.info/wiki/Property:nc) which doesn't always support the `-z` nor `-v` options/features (depending on compile flags and `busybox` version). ```shell podman run --rm busybox nc ``` ```text BusyBox v1.22.1 (2014-05-22 23:22:11 UTC) multi-call binary. Usage: nc [-iN] [-wN] [-l] [-p PORT] [-f FILE|IPADDR PORT] [-e PROG] Open a pipe to IP:PORT or FILE -l Listen mode, for inbound connects (use -ll with -e for persistent server) -p PORT Local port -w SEC Connect timeout -i SEC Delay interval for lines sent -f FILE Use file (ala /dev/ttyS0) instead of network -e PROG Run PROG after connect ``` ```shell podman run --rm alpine nc ``` ```text BusyBox v1.35.0 (2022-11-19 10:13:10 UTC) multi-call binary. Usage: nc [OPTIONS] HOST PORT - connect nc [OPTIONS] -l -p PORT [HOST] [PORT] - listen -e PROG Run PROG after connect (must be last) -l Listen mode, for inbound connects -lk With -e, provides persistent server -p PORT Local port -s ADDR Local address -w SEC Timeout for connects and final net reads -i SEC Delay interval for lines sent -n Don't do DNS resolution -u UDP mode -b Allow broadcasts -v Verbose -o FILE Hex dump traffic -z Zero-I/O mode (scanning) ``` ```shell podman run --rm centos nc ``` Could not find `nc` in `$PATH`. ```shell podman run --rm fedora nc ``` Could not find `nc` in `$PATH`. ```shell podman run --rm debian nc ``` Could not find `nc` in `$PATH`. ```shell podman run --rm ubuntu nc ``` Could not find `nc` in `$PATH`.Executing curl in common base images
It's less common to find `curl` installed. ```shell podman run --rm busybox curl ``` Could not find `curl` in `$PATH`. ```shell podman run --rm alpine curl ``` Could not find `curl` in `$PATH`. ```shell podman run --rm centos curl ``` ```text curl: try 'curl --help' or 'curl --manual' for more information ``` ```shell podman run --rm fedora curl ``` ```text curl: try 'curl --help' or 'curl --manual' for more information ``` ```shell podman run --rm debian curl ``` Could not find `curl` in `$PATH`. ```shell podman run --rm ubuntu curl ``` Could not find `curl` in `$PATH`.Running a personal Open Build Service (OBS) branch of
podman
v4.5.0 (as suggested in another issue), with a build dependency fix and added BTRFS support. I'm just starting out using OBS, but it should not affect this issue.apt show podman
```text Package: podman Version: 4:4.5.0-debian12joelpurra1+obs82.1 Priority: optional Maintainer: Podman Debbuild Maintainers