Closed 4a6f656c closed 3 years ago
/cc @cherrymui
It may be because it uses R27 (REGTMP). The injected call, runtime.asyncPreempt, clobbers R27. So if R27 is live it cannot be async preempted. However, R27 is only live within those three instructions. If we are preempted there we can just restart at first instruction. So it should be marked restartable (_PCDATA_Restart1 or _PCDATA_Restart2), and it indeed is. The code in isAsyncSafePoint seems to return too early for those values (the code below actually expects those values). I'll fix.
That said, I would expect the signal does not always land on that instruction. Some signal would land on a preemptible region (e.g. a MUL instruction) and it can preempt. Maybe the signal delivery on OpenBSD on that machine is somehow biased?
Change https://golang.org/cl/340011 mentions this issue: runtime: accept restartable sequence pcdata values in isAsyncSafePoint
It may be because it uses R27 (REGTMP). The injected call, runtime.asyncPreempt, clobbers R27. So if R27 is live it cannot be async preempted. However, R27 is only live within those three instructions. If we are preempted there we can just restart at first instruction. So it should be marked restartable (_PCDATA_Restart1 or _PCDATA_Restart2), and it indeed is. The code in isAsyncSafePoint seems to return too early for those values (the code below actually expects those values). I'll fix.
Ah! REGTMP was the bit that I was missing.
That said, I would expect the signal does not always land on that instruction. Some signal would land on a preemptible region (e.g. a MUL instruction) and it can preempt. Maybe the signal delivery on OpenBSD on that machine is somehow biased?
So this is an interesting and valid point. Obviously this relies on a SIGURG being delivered to a specific thread and that is only going to happen when the thread performs a system call (none in this case) or a context switch. The context switch in turn will be triggered via a timer interrupt being delivered, resulting in a trap into the kernel. Hacking up some equivalent C code, I see the SIGURG being delivered on one of three instructions in the loop on this machine, whereas when run on a Pine64+ machine it lands on almost any instruction in the loop. I suspect that what we're seeing here may be architectural, with the trap occurring at particular points in pipeline execution - I'll ask some people who are more familiar with the M1 hardware.
OpenBSD's
arm64
port runs on Apple M1 hardware, however only currently stable when running with a single-processor kernel:The Go
openbsd/arm64
port works as expected on this machine, with the exception of theTestAsyncPreempt
failing due to a timeout (note this is not a new regression - the same failure occurs with both Go -tip and Go 1.16). The strange thing is that this test passes on theopenbsd/arm64
builder (running on a Pine64+ board), when running with both a single-processor and multi-processor kernel (worth noting this is a slightly older OpenBSD version, but I'm not currently aware of any kernel changes that would impact this).Adding some debugging, we get async preempt signals being delivered to the process (
SIGURG
), however we do not preempt as we return false fromisAsyncSafePoint
via the if test at line 401 ofsrc/runtime/preempt.go
(in other words,up
returned bypcdatavalue2
is not_PCDATA_UnsafePointSafe
).Building
src/runtime/testdata/testprog
manually and running with theAsyncPreempt
argument, shows that we keep callingisAsyncSafePoint
with apc
of0xcf408
- this is part of themain.frameless
function:I do not see any reason why that should be an unsafe point and if that is indeed the case, presumably this means that the PCDATA table has incorrect information. Also, from manual inspection, the code in this function decompiles to the same assembly on both machines.
Any hints in tracking this down further would be greatly appreciated.