Healthcheck always shows "starting"

acidumirae commented 1 year ago

Issue Description

When running a container with podman that defines a health check, the following state is returned.

               "Health": {
                    "Status": "starting",
                    "FailingStreak": 0,
                    "Log": null
               },

Steps to reproduce the issue

Run podman with defined health checks in place podman run --health-cmd 'pg_isready -U postgres' --health-interval 10s --health-timeout 5s --health-retries 30 -e POSTGRES_PASSWORD=password docker.io/library/postgres:15.2-alpine
Check the state podman inspect $cid "Health": { "Status": "starting", "FailingStreak": 0, "Log": null },

Describe the results you received

podman inspect $cid "Health": { "Status": "starting", "FailingStreak": 0, "Log": null },

Describe the results you expected

"Health": { "Status": "healthy", "FailingStreak": 0, "Log": null },

podman info output

host:
  arch: amd64
  buildahVersion: 1.30.0
  cgroupControllers: []
  cgroupManager: cgroupfs
  cgroupVersion: v1
  conmon:
    package: conmon-2.1.7-r1
    path: /usr/bin/conmon
    version: 'conmon version 2.1.7, commit: unknown'
  cpuUtilization:
    idlePercent: 99.65
    systemPercent: 0.12
    userPercent: 0.23
  cpus: 4
  databaseBackend: boltdb
  distribution:
    distribution: alpine
    version: 3.18.2
  eventLogger: file
  hostname: embed
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
  kernel: 6.1.38-0-lts
  linkmode: dynamic
  logDriver: k8s-file
  memFree: 1238466560
  memTotal: 5821636608
  networkBackend: netavark
  ociRuntime:
    name: crun
    package: crun-1.8.4-r0
    path: /usr/bin/crun
    version: |-
      crun version 1.8.4
      commit: 5a8fa99a5e41facba2eda4af12fa26313918805b
      rundir: /tmp/podman-run-1000/crun
      spec: 1.0.0
      +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +YAJL
  os: linux
  remoteSocket:
    path: /tmp/podman-run-1000/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /etc/containers/seccomp.json
    selinuxEnabled: false
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.2.0-r0
    version: |-
      slirp4netns version 1.2.0
      commit: 656041d45cfca7a4176f6b7eed9e4fe6c11e8383
      libslirp: 4.7.0
      SLIRP_CONFIG_VERSION_MAX: 4
      libseccomp: 2.5.4
  swapFree: 0
  swapTotal: 0
  uptime: 91h 19m 52.00s (Approximately 3.79 days)
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  search:
  - docker.io
store:
  configFile: /home/vagrant/.config/containers/storage.conf
  containerStore:
    number: 1
    paused: 0
    running: 1
    stopped: 0
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /home/vagrant/.local/share/containers/storage
  graphRootAllocated: 20646682624
  graphRootUsed: 10664951808
  graphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 1
  runRoot: /tmp/containers-user-1000/containers
  transientStore: false
  volumePath: /home/vagrant/.local/share/containers/storage/volumes
version:
  APIVersion: 4.5.1
  Built: 1688368964
  BuiltTime: Mon Jul  3 15:22:44 2023
  GitCommit: ""
  GoVersion: go1.20.5
  Os: linux
  OsArch: linux/amd64
  Version: 4.5.1

Podman in a container

No

Privileged Or Rootless

Rootless

Upstream Latest Release

No

Additional environment details

Alpine 3.18.2 running in a Hyper-V VM

Additional information

Additional information like issue happens only occasionally or issue happens with a particular architecture or on a particular setting

acidumirae commented 1 year ago

Built podman 4.6.0 from source and issue reproduced.

apk add --update alpine-sdk apk add --virtual build-dependencies build-base gcc wget git apk add --no-cache --virtual .build-deps bash gcc musl-dev openssl go apk add linux-lts-dev linux-headers btrfs-progs-dev device-mapper lvm2-dev gpgme-dev libseccomp-dev export GOROOT=/usr/lib/go export GOPATH=/go export PATH=/go/bin:$PATH mkdir -p ${GOPATH}/src ${GOPATH}/bin go install github.com/Masterminds/glide@latest # test go integrity make BUILDTAGS="seccomp" PREFIX=/usr

podman run --health-cmd 'pg_isready -U postgres' --health-interval 10s --health-timeout 5s --health-retries 30 -e POSTGRES_PASSWORD=password docker.io/library/postgres:15.2-alpine $ sudo podman inspect vigorous_babbage | more

"Health": { "Status": "starting", "FailingStreak": 0, "Log": null },

mohsinsarwari commented 1 year ago

I'm also having the same issue. When I run the healthcheck manually podman heathcheck run {container_name}, the status changes to healthy. It seems that podman is not creating the systemd service and timer to run the healthcheck automatically. I've tried to run my container and all as root as well, but I still have the same issue.

Luap99 commented 1 year ago

Podman uses systemd timers to run healtchecks periodically, so if you use a distro without systemd it will not work.

acidumirae commented 1 year ago

@Luap99 so is it a bug-feature then? how docker handles it just fine without systemd?

Luap99 commented 1 year ago

Well docker runs as a daemon, they can just execute the healtchecks themselves.

With podman there is no guarantee that a podman process is around so we need to rely on another process to execute us periodically and systemd fits that well.

vrothberg commented 1 year ago

On a non-systemd machine, I would expect Podman to throw an error. At least for the case where users explicitly set health checks via the CLI; I am not sure about the case where Podman uses the checks from the image.

Luap99 commented 1 year ago

I don't think we should error with images, https://github.com/containers/podman/pull/16749 But I think we must make sure the status does not show starting and acts the same as when it was disabled.

As for errors when they are set on the cli that might make more sense but I am not sure we can really check that that far in the backend where the healthcheck was set.

vrothberg commented 1 year ago

As for errors when they are set on the cli that might make more sense but I am not sure we can really check that that far in the backend where the healthcheck was set.

I think we could check it in the front end and check whether systemd is available.

Luap99 commented 1 year ago

I think we could check it in the front end and check whether systemd is available.

We need to be careful with frontend checks, they will result in false positives when using podman-remote.

vrothberg commented 1 year ago

I think we could check it in the front end and check whether systemd is available.

We need to be careful with frontend checks, they will result in false positives when using podman-remote.

Fair point. It would only fail for local Linux cases.

acidumirae commented 1 year ago

It would be nice to mention it here https://podman.io/docs/installation#build-tags as now it says systemd is used for journaling and at least emit some warning that --health-cmd, --health-interval, --health-timeout, --health-retries will do nothing for you if the timers are not implemented. Anyway for now podman cannot be a drop-in replacement for docker as docker-compose breaks on non-systemd systems when you have dependencies and healthchecks.

rhatdan commented 1 year ago

YOu could setup cron jobs to fire the checks. But we are taking advantage of systemd for many functions of Podman and this is not likely to change. If you would like to open a PR to change the documentation to advise people on non-systemd machines how to use cron jobs for healthchecks, that would be fine. I am moving this to a discussion since it is not something we plan to fix.

containers / podman