containers / podman-compose

a script to run docker-compose.yml using podman
GNU General Public License v2.0
5.11k stars 484 forks source link

Podman Compose 1.1.0 fails non-deterministically to start a system with multiple dependencies #921

Open devurandom opened 6 months ago

devurandom commented 6 months ago

Describe the bug

Podman Compose 1.1.0 fails non-deterministically to start a system with multiple dependencies: Error: unable to start container 5cdaac...: generating dependency graph for container 5cdaac...: container 1534f2... depends on container e524cd... not found in input list: no such container.

Most of the time starting the system fails, but seldomly it comes up.

This appears to be a regression in podman-compose-1.1.0-1.fc40. I did not observe this issue with podman-compose-1.0.6-6.fc40.

To Reproduce

# mwe.yaml
services:
  a:
    depends_on:
      b:
        condition: service_healthy
      c:
        condition: service_healthy
      d:
        condition: service_started
    image: docker.io/library/alpine:3.19.1@sha256:c5b1261d6d3e43071626931fc004f70149baeba2c8ec672bd4f27761f8e1ad6b
    command:
      - sleep
      - inf
  g:
    depends_on:
      a:
        condition: service_healthy
    image: docker.io/library/alpine:3.19.1@sha256:c5b1261d6d3e43071626931fc004f70149baeba2c8ec672bd4f27761f8e1ad6b
    command:
      - sleep
      - inf
  b:
    image: docker.io/library/alpine:3.19.1@sha256:c5b1261d6d3e43071626931fc004f70149baeba2c8ec672bd4f27761f8e1ad6b
    command:
      - sleep
      - inf
    healthcheck:
      test: ["CMD", "true"]
      start_period: 10s
  c:
    image: docker.io/library/alpine:3.19.1@sha256:c5b1261d6d3e43071626931fc004f70149baeba2c8ec672bd4f27761f8e1ad6b
    command:
      - sleep
      - inf
    healthcheck:
      test: ["CMD", "true"]
      start_period: 10s
  d:
    depends_on:
      b:
        condition: service_healthy
      e:
        condition: service_healthy
      f:
        condition: service_healthy
    image: docker.io/library/alpine:3.19.1@sha256:c5b1261d6d3e43071626931fc004f70149baeba2c8ec672bd4f27761f8e1ad6b
    command:
      - sleep
      - inf
  e:
    image: docker.io/library/alpine:3.19.1@sha256:c5b1261d6d3e43071626931fc004f70149baeba2c8ec672bd4f27761f8e1ad6b
    command:
      - sleep
      - inf
    healthcheck:
      test: ["CMD", "true"]
      start_period: 10s
  f:
    image: docker.io/library/alpine:3.19.1@sha256:c5b1261d6d3e43071626931fc004f70149baeba2c8ec672bd4f27761f8e1ad6b
    command:
      - sleep
      - inf
    healthcheck:
      test: ["CMD", "true"]
      start_period: 10s
❯ ls -a
mwe.yaml
❯ podman-compose --project-name=mwe --file=mwe.yaml up
218fe37c3dfd3dff812189148c59668d9a154fce19f51788901eb685492c98f5
9ac84add9eb0dfba7e06d62a288e7fb65ba51d2cbbdaa5883110c8b530f58e33
05af0dec32bd8ce4238c7da6e69a7b716d7cfb756f0f53d3205ab99b32480e45
873939148ce26f231ab204edd2a74f3ee42cd3541ed2842bfff8ac05a676186d
e524cdc1115a3f9fc3403f91e91fa931f69f121043a3e49382a74a2a607631a6
1534f2df395b1a7c0357daa8b60c032ca3824234de61ca1289cc501c0fdad079
7327f35dc2dae15a28a229cd3fa360e8a14e3ceb991250d6eb81308bf60f3a82
5cdaac3f8de071ca8e6a889e90989518299878e8977a6179982e268e94d4d6f3
[g] | Error: unable to start container 5cdaac3f8de071ca8e6a889e90989518299878e8977a6179982e268e94d4d6f3: generating dependency graph for container 5cdaac3f8de071ca8e6a889e90989518299878e8977a6179982e268e94d4d6f3: container 1534f2df395b1a7c0357daa8b60c032ca3824234de61ca1289cc501c0fdad079 depends on container e524cdc1115a3f9fc3403f91e91fa931f69f121043a3e49382a74a2a607631a6 not found in input list: no such container

Expected behavior

Actual behavior

Podman Compose outputs Error: unable to start container 5cdaac...: generating dependency graph for container 5cdaac...: container 1534f2... depends on container e524cd... not found in input list: no such container and continues starting other containers, but never gets the full system up.

Environment:

❯ grep PLATFORM /etc/os-release
PLATFORM_ID="platform:f40"
❯ podman version
Client:       Podman Engine
Version:      5.0.2
API Version:  5.0.2
Go Version:   go1.22.1
Built:        Wed Apr 17 02:00:00 2024
OS/Arch:      linux/amd64
❯ podman-compose --version
podman-compose version 1.1.0
podman version 5.0.2

Related: https://github.com/containers/podman-compose/issues/683 (the error message there is similar, but the cause seems different; the problem there exists with 1.0.6, while the problem here only appeared with 1.1.0)

devurandom commented 6 months ago

I downgraded to Podman Compose 1.0.6 and the system starts up reliably again:

❯ podman-compose --project-name=mwe --file=mwe.yaml up
podman-compose version: 1.0.6
['podman', '--version', '']
using podman version: 5.0.2
** excluding:  set()
['podman', 'ps', '--filter', 'label=io.podman.compose.project=mwe', '-a', '--format', '{{ index .Labels "io.podman.compose.config-hash"}}']
['podman', 'network', 'exists', 'mwe_default']
podman create --name=mwe_b_1 --label io.podman.compose.config-hash=ffa681596f99163cdd1b6b21e1d304e83680fee6959c54ec86a54425340a06a7 --label io.podman.compose.project=mwe --label io.podman.compose.version=1.0.6 --label PODMAN_SYSTEMD_UNIT=podman-compose@mwe.service --label com.docker.compose.project=mwe --label com.docker.compose.project.working_dir=[REDACTED]/mwe --label com.docker.compose.project.config_files=mwe.yaml --label com.docker.compose.container-number=1 --label com.docker.compose.service=b --net mwe_default --network-alias b --healthcheck-command /bin/sh -c true --healthcheck-start-period 10s docker.io/library/alpine:3.19.1@sha256:c5b1261d6d3e43071626931fc004f70149baeba2c8ec672bd4f27761f8e1ad6b sleep inf
3fd3be4ec70ff23a2b77c8b35cdb2b2061489b70601d0e3102e14a505baf56df
exit code: 0
['podman', 'network', 'exists', 'mwe_default']
podman create --name=mwe_c_1 --label io.podman.compose.config-hash=ffa681596f99163cdd1b6b21e1d304e83680fee6959c54ec86a54425340a06a7 --label io.podman.compose.project=mwe --label io.podman.compose.version=1.0.6 --label PODMAN_SYSTEMD_UNIT=podman-compose@mwe.service --label com.docker.compose.project=mwe --label com.docker.compose.project.working_dir=[REDACTED]/mwe --label com.docker.compose.project.config_files=mwe.yaml --label com.docker.compose.container-number=1 --label com.docker.compose.service=c --net mwe_default --network-alias c --healthcheck-command /bin/sh -c true --healthcheck-start-period 10s docker.io/library/alpine:3.19.1@sha256:c5b1261d6d3e43071626931fc004f70149baeba2c8ec672bd4f27761f8e1ad6b sleep inf
6572f923cad548ea042fa1af5049827871e2bf468708b02d998ab5ff59da4a1c
exit code: 0
['podman', 'network', 'exists', 'mwe_default']
podman create --name=mwe_e_1 --label io.podman.compose.config-hash=ffa681596f99163cdd1b6b21e1d304e83680fee6959c54ec86a54425340a06a7 --label io.podman.compose.project=mwe --label io.podman.compose.version=1.0.6 --label PODMAN_SYSTEMD_UNIT=podman-compose@mwe.service --label com.docker.compose.project=mwe --label com.docker.compose.project.working_dir=[REDACTED]/mwe --label com.docker.compose.project.config_files=mwe.yaml --label com.docker.compose.container-number=1 --label com.docker.compose.service=e --net mwe_default --network-alias e --healthcheck-command /bin/sh -c true --healthcheck-start-period 10s docker.io/library/alpine:3.19.1@sha256:c5b1261d6d3e43071626931fc004f70149baeba2c8ec672bd4f27761f8e1ad6b sleep inf
9f314c05936f33c3b2164ceca50a2c5408c7ece84c49116cf908f4ac96d2c61c
exit code: 0
['podman', 'network', 'exists', 'mwe_default']
podman create --name=mwe_f_1 --label io.podman.compose.config-hash=ffa681596f99163cdd1b6b21e1d304e83680fee6959c54ec86a54425340a06a7 --label io.podman.compose.project=mwe --label io.podman.compose.version=1.0.6 --label PODMAN_SYSTEMD_UNIT=podman-compose@mwe.service --label com.docker.compose.project=mwe --label com.docker.compose.project.working_dir=[REDACTED]/mwe --label com.docker.compose.project.config_files=mwe.yaml --label com.docker.compose.container-number=1 --label com.docker.compose.service=f --net mwe_default --network-alias f --healthcheck-command /bin/sh -c true --healthcheck-start-period 10s docker.io/library/alpine:3.19.1@sha256:c5b1261d6d3e43071626931fc004f70149baeba2c8ec672bd4f27761f8e1ad6b sleep inf
f63bb91caae7f4a55376cdf0d8343020b9bfa6787ebb872701a135cb8101ae67
exit code: 0
['podman', 'network', 'exists', 'mwe_default']
podman create --name=mwe_d_1 --requires=mwe_e_1,mwe_f_1,mwe_b_1 --label io.podman.compose.config-hash=ffa681596f99163cdd1b6b21e1d304e83680fee6959c54ec86a54425340a06a7 --label io.podman.compose.project=mwe --label io.podman.compose.version=1.0.6 --label PODMAN_SYSTEMD_UNIT=podman-compose@mwe.service --label com.docker.compose.project=mwe --label com.docker.compose.project.working_dir=[REDACTED]/mwe --label com.docker.compose.project.config_files=mwe.yaml --label com.docker.compose.container-number=1 --label com.docker.compose.service=d --net mwe_default --network-alias d docker.io/library/alpine:3.19.1@sha256:c5b1261d6d3e43071626931fc004f70149baeba2c8ec672bd4f27761f8e1ad6b sleep inf
cec5f700f0716b3d639b53cf1aa348b98966b686d5b787c64b7747941d10b0c1
exit code: 0
['podman', 'network', 'exists', 'mwe_default']
podman create --name=mwe_a_1 --requires=mwe_e_1,mwe_b_1,mwe_d_1,mwe_f_1,mwe_c_1 --label io.podman.compose.config-hash=ffa681596f99163cdd1b6b21e1d304e83680fee6959c54ec86a54425340a06a7 --label io.podman.compose.project=mwe --label io.podman.compose.version=1.0.6 --label PODMAN_SYSTEMD_UNIT=podman-compose@mwe.service --label com.docker.compose.project=mwe --label com.docker.compose.project.working_dir=[REDACTED]/mwe --label com.docker.compose.project.config_files=mwe.yaml --label com.docker.compose.container-number=1 --label com.docker.compose.service=a --net mwe_default --network-alias a docker.io/library/alpine:3.19.1@sha256:c5b1261d6d3e43071626931fc004f70149baeba2c8ec672bd4f27761f8e1ad6b sleep inf
48b976a077060ce1d8365dc97b97646a60cbaa38ed557977558c21031692951a
exit code: 0
['podman', 'network', 'exists', 'mwe_default']
podman create --name=mwe_g_1 --requires=mwe_e_1,mwe_b_1,mwe_d_1,mwe_a_1,mwe_f_1,mwe_c_1 --label io.podman.compose.config-hash=ffa681596f99163cdd1b6b21e1d304e83680fee6959c54ec86a54425340a06a7 --label io.podman.compose.project=mwe --label io.podman.compose.version=1.0.6 --label PODMAN_SYSTEMD_UNIT=podman-compose@mwe.service --label com.docker.compose.project=mwe --label com.docker.compose.project.working_dir=[REDACTED]/mwe --label com.docker.compose.project.config_files=mwe.yaml --label com.docker.compose.container-number=1 --label com.docker.compose.service=g --net mwe_default --network-alias g docker.io/library/alpine:3.19.1@sha256:c5b1261d6d3e43071626931fc004f70149baeba2c8ec672bd4f27761f8e1ad6b sleep inf
3d75d60dfd53dcacd0f57d3c2fbed78fc18567847ad35c02653a6b6dc12fee93
exit code: 0
podman start -a mwe_b_1
podman start -a mwe_c_1
podman start -a mwe_e_1
podman start -a mwe_f_1
podman start -a mwe_d_1
podman start -a mwe_a_1
podman start -a mwe_g_1

In case anyone else needs to downgrade, this is how I found an RPM for the previous version

  1. https://packages.fedoraproject.org/pkgs/podman-compose/podman-compose/
  2. Click "1.1.0-1.fc40"
  3. https://packages.fedoraproject.org/pkgs/podman-compose/podman-compose/fedora-40-updates.html
  4. Click "Search for updates"
  5. https://bodhi.fedoraproject.org/updates/?search=podman-compose-1.1.0-1.fc40
  6. Replace version in search string in URL
  7. https://bodhi.fedoraproject.org/updates/?search=podman-compose-1.0.6-6.fc40
  8. https://bodhi.fedoraproject.org/updates/FEDORA-2024-748f62dc83
  9. https://koji.fedoraproject.org/koji/buildinfo?buildID=2403532
  10. Find "podman-compose-1.0.6-6.fc40.noarch.rpm"
devurandom commented 4 months ago

Still an issue in podman-compose version 1.2.0 with podman version 5.1.1, though it seems a little easier to get the pod into a working state by repeatedly cycling up, down, up, ... until it reaches a stable state.

pfeileon commented 4 months ago

Healthchecks actually don't work at all: https://github.com/containers/podman-compose/issues/866

devurandom commented 4 months ago

Healthchecks actually don't work at all: #866

Thanks for pointing out that issue. Is this still the case in 1.1.0? If 1.1.0 would have introduced support for health checks, that might be an explanation why my pod fails to start since upgrading from 1.0.6 to 1.1.0.

pfeileon commented 4 months ago

It didn't work on Friday last week on Fedora with the then newest available version.

anton-b commented 3 months ago

Any resolution as of yet?

podman-compose --version     
podman-compose version 1.2.0
podman version 5.2.2
devurandom commented 2 months ago

By now, I have better results cycling down, up, down, up, ... than down, up, up, .... The latter will result in cannot open '/run/user/1000/crun/e522c0ba12fa8e475d44a6a589934df1fdf75b564865c10062fb62dfac76cfa7/exec.fifo': No such file or directory on subsequent runs of up (2nd and later) for various containers in the compose environment and in my experience never recover. The former by now has a decent chance of succeeding (I'd say still less than 50%, but already significantly better than when I opened this report).

On Fedora 40:

❯ podman-compose --version
podman-compose version 1.2.0
podman version 5.2.1
az-z commented 2 months ago
podman-compose version 1.2.0
podman version 4.9.4
cat /etc/redhat-release
Fedora release 39 (Thirty Nine)

Linux dell5000 6.5.12-100.fc37.x86_64

no jazz here too.
The highly unscientific approach of "up , down, down, wait 15ish seconds, up "tends to get the dependency graph issue resolved. The breaking behavior to me is that compose continues with the start sequence even if one of the containers fails to start....

devurandom commented 3 weeks ago

Persists on Fedora 41:

❯ podman-compose --version
podman-compose version 1.2.0
podman version 5.2.5

Either I get:

Error: unable to start container 74db967475484d0861a2716b6a2fc2d214310670c6a081eb25364d5a82d7e5ee: generating dependency graph for container 74db967475484d0861a2716b6a2fc2d214310670c6a081eb25364d5a82d7e5ee: container b5efa9565e62a9c221aba957cad725fcb3287a7ad64e2eb15ddccd9f698dc062 depends on container 03a44f3ddd150d99b8a50392ef3f409257ac6900cb8e9bd7a07f7e716c723d2b not found in input list: no such container

Or:

cannot open `/run/user/1000/crun/12734a14a8b8fa36a364d5a1e425d6dbb09eec3eba3fa1a13b8b83e0f7eb2eb2/exec.fifo`: No such file or directory
Error: unable to start container 12734a14a8b8fa36a364d5a1e425d6dbb09eec3eba3fa1a13b8b83e0f7eb2eb2: `/usr/bin/crun start 12734a14a8b8fa36a364d5a1e425d6dbb09eec3eba3fa1a13b8b83

(EDIT: I split the 2nd error into https://github.com/containers/podman-compose/issues/1072, in case it happens to others independently of the dependency issue.)

It needs about a dozen "up"/"down" cycles until I finally manage to get a stable system.

On Fedora 40 I would also get errors telling me to kill aardvark-dns and delete /run/user/1000/containers/networks/aardvark-dns (I believe the one added in https://github.com/containers/netavark/pull/856), but I have not seen those yet since updating to Fedora 41.

dwolfeu commented 3 weeks ago

Downgrading to 1.0.6 (using this workaround) also worked for me.

devurandom commented 2 weeks ago

I split the "cannot open /run/user/1000/crun/.../exec.fifo: No such file or directory" error into https://github.com/containers/podman-compose/issues/1072 -- maybe it happens to others also, independently of the dependency issue.