coreos / fedora-coreos-tracker

Issue tracker for Fedora CoreOS
https://fedoraproject.org/coreos/
262 stars 59 forks source link

kola testiso tests on f41+ sometimes time out #1796

Open jlebon opened 1 week ago

jlebon commented 1 week ago

We've seen this a couple of times in the Bodhi testing, but now it also showed up in next-devel:

[2024-09-16T08:12:53.904Z] FAIL: pxe-offline-install.bios (10m0.007s)
[2024-09-16T08:12:53.904Z]     timed out after 10m0s

But the system logs don't show anything weird. Almost like either QEMU was killed, or kola just lost contact with it. Pod has 9Gi of RAM and testiso tests run serially so memory limits shouldn't be a concern here.

pxe-offline-install.bios.zip

dustymabe commented 1 week ago

FTR I did look at the code the other day to assure that if the process was killed it should print out a message to the console. I tested that today and it seems true:

Detected development build; disabling signature verification
Running test: pxe-offline-install.bios
FAIL: pxe-offline-install.bios (49.882s)
    QEMU unexpectedly exited while awaiting completion: process killed
Error: harness: test suite failed
2024-09-16T14:04:43Z cli: harness: test suite failed
failed to execute cmd-kola: exit status 1
+ rc=1
+ set +x

killed the qemu process with a kill -9.

dustymabe commented 4 days ago

Saw this again today in CI for https://github.com/coreos/fedora-coreos-config/pull/3171

Opened https://github.com/coreos/fedora-coreos-pipeline/pull/1039 to see if we can get more information about the problem.