Open bradfitz opened 10 months ago
The offending stack slot is SP+0x98
in tailscale.com/logtail.(*Logger).Write(SB)
.
It is a pointer slot, and is initialized here:
logtail.go:734 0x5e76a2 66440fd6bc2498000000 MOVQ X15, 0x98(SP)
If X15
wasn't properly initialized to 0, that might be the issue. I recall plan9 has some strange interactions with floating-point (can't be used in signal handlers?). This could also be the result of some plan9 assembly that doesn't properly zero X15
on transitions from abi0
to abiInternal
(which would be in assembly somewhere).
The other write to SP+0x98
can't be the problem, as it is of a known stack pointer.
logtail.go:751 0x5e7888 4c8d842488000000 LEAQ 0x88(SP), R8
logtail.go:751 0x5e7890 4c89842498000000 MOVQ R8, 0x98(SP)
We should already don't use X15 for zeroing on Plan 9. The rewriting rules for MOVOstoreconst
and DUFFZERO
are guarded with useSSE
and !noDuffDevice
, which are false on Plan 9.
Hmmm, https://cs.opensource.google/go/go/+/master:src/cmd/compile/internal/amd64/ggen.go;l=66 this one is not guarded. Maybe this is the problem.
Looks like CL https://go.dev/cl/453536 is the culprit. It added a new use of X15 without the guard.
@cherrymui, nice find! I look forward to what a test for this will look like 😅
For a test, I guess we could have an assembly function that clobbers X15, then call a function with a certain frame layout which would trigger the zeroing code, then check it is actually zeroed. The ABI wrapper between assembly and Go would zero X15, but I think we don't do that on Plan 9, so we can keep X15 clobbered.
Maybe we could have the compiler just error out if X15 (or any SSE) is used on Plan 9 in compiled code? (We should probably want to allow assembly code.)
I am wondering, since you are using SSE: should you be testing for buildcfg.GOAMD64 > 1, not isPlan9? I got burned at google by SSE when GOAMD64 got defaulted to v2 and I ran on an amd64 machine that did not have SSE enabled. We fixed the glitch, but added GOAMD64=v1 to the build environment. Is this a similar case? Using x15 without checking GOAMD64 seems dicey to me.
@rminnich I'm not sure what you are referring to - maybe 386? Because our amd64 architecture port requires SSE, and has since Go was released.
Removing release blocker since Plan 9 isn't a first class port.
@bradfitz @cherrymui the change to cmd/compile/internal/amd64/ggen.go:65 didn't fix the crash.
I think I found a second contributor or the cause; it is in runtime/stack.go:72. Both diffs are in go1-21-2-diffs.txt. Does it look reasonable?
Empirically, one or both fixes resolve the issue.
I don't understand what the stack fix is doing. It's just increasing the reserved stack for plan9 from 512 bytes to 4096 bytes. Is there some reason plan9 needs that extra stack? It doesn't sound like it is directly related to the X15
problem.
Looking at ggen.go's history: https://github.com/golang/go/commit/326df693d700fa42c2740dcc89a14c821d019f52
This commit switches from using R13 to using X15, saying:
Use XORL and X15 for zeroing in ggen's zerorange on AMD64
Prefer a SSE store from X15 over XORL REG REG -> MOVQ.
Use XORL REG, REG to setup 0 for REP STOS.
@cherrymui, @randall77, any thoughts on the mentioned ggen.go change?
What version of Go are you using (
go version
)?Go 1.21.0
Does this issue reproduce with the latest release?
Haven't yet tried Go 1.21.1.
What operating system and processor architecture are you using (
go env
)?plan9/amd64
What did you do?
A Plan 9 user (@9nut) sent me this crash trying to run Tailscale (
go install tailscale.com/cmd/tailscaled
at https://github.com/tailscale/tailscale/commit/6fd1961cd70385568ac02c69a60d4f48314bd767). I don't have enough Plan 9 knowledge to debug or repro:What did you expect to see?
Not a crash in
runtime.copystack
.What did you see instead?
A crash.