Closed kobalicek closed 1 year ago
Yeah, it's difficult to say. Could be both something inside the VM and something outside. Perhaps it's possible to run through DTrace to debug it. Not sure if that works on a GHA runner though. Perhaps it's possible to redirect the output of the VM to some file and print that, to see what's going on.
Do you have a link to a failing job?
I have - actually two failing jobs within 2 days:
I'm not sure that would help though, as nothing interesting happens in these runs, it just stops at the beginning.
I am seeing the same problem:
and a couple of failing jobs:
The last successful NetBSD VM run was on 2023-09-21, https://github.com/OpenMPT/openmpt/actions/runs/6264853726 .
And one more:
I think that this is the most unstable runner at the moment - it fails in like 50% of time like this
it fails in like 50% of time like this
Oh, that's pretty bad. I'll see if I can debug the issue.
Seems like GitHub made some breaking changes again. This happens when trying to run QEMU:
dyld[1372]: Library not loaded: '/usr/local/opt/capstone/lib/libcapstone.4.dylib'
But it should always fail.
This makes it much easier to fix. I thought all the dependencies were statically linked to avoid this exact problem, but it looks like I missed one.
BTW, this is not specific to NetBSD, it applies for all platforms when QEMU is used as the hypervisor. But since xhyve if the default hypervisor for FreeBSD and OpenBSD on macOS runners it doesn't affect those platforms unless explicitly switching hypervisor to QEMU.
If you're in a hurry you can switch to using Linux runners instead of macOS as a workaround, but macOS has better performance.
Yeah it always fails.
I'm removing netbsd from my CI as this just makes all builds to fail.
I think this is just really unfortunate reality that it's not natively supported by github.
Fixed in https://github.com/cross-platform-actions/action/releases/tag/v0.19.1. I've added a test to make sure this doesn't happen again. In doing that I also found another non-system dependency. But that is fixed as well.
I'm having the following occasional issue when running NetBSD runner:
I'm using QEMU to run it.
Basically the VM is not ready after 120 seconds, which causes the action to be terminated.
I'm not sure what is the problem in this case - if the GHA runner is simply overloaded or whether there is a race or something caused by the action itself, which results in inability to connect to the SSH server inside the VM.
I'm wondering - is this something we have to live with or do you think that this can be fixed somehow? It's very hard to diagnose as it doesn't happen every time, but it happens frequently enough to have my attention.