containers / podman

Podman: A tool for managing OCI containers and pods.
https://podman.io
Apache License 2.0
23.83k stars 2.42k forks source link

Yet another missing-logs-and-events flake: journald? #24220

Open edsantiago opened 1 month ago

edsantiago commented 1 month ago

I've lost track of how many bugs I've opened for something-or-other like this. I'm going to lump together here the category of flakes seen in late 2024 where podman-remote logs is supposed to see something but doesn't.

Likely fix: change the tests so instead of podman wait; podman logs they do for 5 retries { podman logs; grep for what we want; retry if not there.

x x x x x x
sys(1) remote(2) fedora-40-aarch64(1) root(2) host(2) sqlite(2)
int(1) fedora-40(1)
github-actions[bot] commented 1 week ago

A friendly reminder that this issue had no activity for 30 days.

edsantiago commented 6 days ago

Two on Thursday, but not remote, so I don't know if they're the same bug or something new. The total so far:

x x x x x x
int(3) remote(3) fedora-41(2) root(5) host(5) sqlite(5)
sys(2) podman(2) fedora-40-aarch64(2)
fedora-40(1)
edsantiago commented 5 days ago

This one is blowing up, and I'm tentatively blaming it on the recent VM update. Issue title changed accordingly.

x x x x x x
int(9) podman(10) rawhide(6) root(14) host(15) sqlite(15)
sys(6) remote(5) fedora-41(4) rootless(1)
fedora-40-aarch64(2)
fedora-41-aarch64(2)
fedora-40(1)
edsantiago commented 4 days ago

cirrus-vm-get-versions, trimmed to remove packages that can't possibly (?) be causing this. New:

debian prior-fedora fedora fedora-aws rawhide
base 13.5 Generic Generic-41-1.4 ? 42-0
kernel 6.11.6-1 6.8.5-301 6.11.6-300 6.11.6-300 6.12.0-0.rc6.20241105git2e1b3cc9d7f7.52
conmon 2.1.12-3 2.1.12-2 2.1.12-3 2.1.12-3 2.1.12-3
containers-common ? 0.60.4-2 0.60.4-4 0.60.4-4 0.60.4-5
crun 1.18.2-1 1.17-1 1.18.2-1 1.18.1-1 1.18.2-1
golang 2:1.23\~2 1.22.7-1 1.23.2-2 1.23.2-2 1.23.2-2
systemd 257\~rc1-3 255.13-1 256.7-1 256.7-1 256.7-1

...and old (c20241016t144444z-f40f39d13, the VMs that were running fine):

debian prior-fedora fedora fedora-aws rawhide
base 13.5 39-1.5 Generic ? 42-0
kernel 6.11.2-1 6.5.6-300 6.8.5-301 6.8.5-301 6.8.5-301
conmon 2.1.12-1 2.1.12-2 2.1.12-2 2.1.12-2 2.1.12-3
containers-common ? 1-99 0.60.4-1 0.60.4-1 0.60.4-1
crun 1.17-1 1.17-1 1.17-1 1.17-1 1.17-1
golang 2:1.23\~2 1.22.8-1 1.22.7-1 1.22.7-1 1.23.2-2
systemd 256.7-1 254.18-1 255.13-1 255.13-1 256.7-1

(I can't use the script's --baseline helper because these are different fedoræ)

Interesting observation: when we see this fail in system tests, it's often in the serial tests (the first pass). That points against it being a high-system-load issue.

So far, no luck reproducing on 1mt.