coreos / fedora-coreos-tracker

Issue tracker for Fedora CoreOS
https://fedoraproject.org/coreos/
264 stars 59 forks source link

kola testiso tests on f41+ sometimes time out #1796

Open jlebon opened 2 months ago

jlebon commented 2 months ago

We've seen this a couple of times in the Bodhi testing, but now it also showed up in next-devel:

[2024-09-16T08:12:53.904Z] FAIL: pxe-offline-install.bios (10m0.007s)
[2024-09-16T08:12:53.904Z]     timed out after 10m0s

But the system logs don't show anything weird. Almost like either QEMU was killed, or kola just lost contact with it. Pod has 9Gi of RAM and testiso tests run serially so memory limits shouldn't be a concern here.

pxe-offline-install.bios.zip

dustymabe commented 2 months ago

FTR I did look at the code the other day to assure that if the process was killed it should print out a message to the console. I tested that today and it seems true:

Detected development build; disabling signature verification
Running test: pxe-offline-install.bios
FAIL: pxe-offline-install.bios (49.882s)
    QEMU unexpectedly exited while awaiting completion: process killed
Error: harness: test suite failed
2024-09-16T14:04:43Z cli: harness: test suite failed
failed to execute cmd-kola: exit status 1
+ rc=1
+ set +x

killed the qemu process with a kill -9.

dustymabe commented 2 months ago

Saw this again today in CI for https://github.com/coreos/fedora-coreos-config/pull/3171

Opened https://github.com/coreos/fedora-coreos-pipeline/pull/1039 to see if we can get more information about the problem.

dustymabe commented 1 month ago

Saw this again today in bodhi tests for https://bodhi.fedoraproject.org/updates/FEDORA-2024-5a61a2fa45

Unfortunately https://github.com/coreos/fedora-coreos-pipeline/pull/1039 doesn't help us here because that isn't used in those CI tests.