golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.72k stars 17.62k forks source link

cmd/compile: switch to a register-based calling convention for Go functions #40724

Open aclements opened 4 years ago

aclements commented 4 years ago

I propose that we switch the Go internal ABI (used between Go functions) from stack-based to register-based argument and result passing for Go ~1.16~ 1.17.

I lay out the details of our proposal and the work required in this document.

The ABI specification can be found here.

This was previously proposed in #18597, where @dr2chase did some excellent prototyping work that our new proposal builds on. With multiple ABI support, we’re now in a much better position to execute on this without breaking compatibility with the existing body of Go assembly code. I’ve opened this new issue to focus on discussion of our new proposal.

/cc @dr2chase @danscales @thanm @cherrymui @mknyszek @prattmic @randall77

An incomplete and evolving list of tasks:

High-priority non-critical path

Enabling steps

Testing

Post-MVP

Cleanup (can be done later)

seebs commented 4 years ago

Once upon a time, I was one of several people whose primary work activity for I think two days was tracking down a weird kernel crash, which was ultimately caused by callee-saved registers. Well, the crash was caused by a [FOO_MAX] array being overrun. But the thing where the result of that overrun was to overwrite a single automatic variable, which never had its address taken, five call frames away? That was caused by callee-saved registers.

Which is to say:

Platform ABIs typically define callee-save registers, which place substantial additional requirements on a garbage collector. There are alternatives to callee-save registers that share many of their benefits, while being much better suited to Go.

+1 +1 +1 +1 +1 +1 [...]

ncw commented 4 years ago

If I'm understanding the proposal correctly, assembler code will continue to be written with the current stack based calling conventions.

I understand the excellent reasoning behind doing this, I'll just note that this is surely going to frustrate assembly code writers not being able to pass and receive args in registers as often assembly code fragments are tiny and the overhead of calling them is massive.

mknyszek commented 4 years ago

@ncw This restriction to ABI0 for assembly writers isn't necessarily permanent. See https://go.googlesource.com/proposal/+/master/design/27539-internal-abi.md#proposal.

laboger commented 4 years ago

Right now writeBarrier is a special builtin that allows arguments to be passed in registers. Couldn't that same concept be extended to others like memmove and memcpy to avoid call overhead even though they are asm?

aclements commented 4 years ago

@laboger , yes, we're considering special-casing just a few assembly functions. The hottest ones by far are memmove, memclrNoHeapPointers, and possibly syscall.Syscall. The write barrier calling convention is very specialized because its context is so unusual and performance-sensitive. We'll probably just switch the others to use the register ABI rather than anything more specialized (and teach the toolchain that those are special even though they're assembly).

mknyszek commented 4 years ago

A comment on the "stack growth" section of the doc, specifically about spilling register state for morestack into the guard space for a goroutine (copied from Gerrit):

I worry a little bit about cases where functions marked nosplit eventually call into a non-nosplit function. If the nosplit frames are large (but don't exceed the guard space) we might find out that we don't have any space left. What should we do in this case? Throwing doesn't seem quite right since this works just fine today (though, you have to be careful). Should we have "guard space" and a subset of that as "no, really, this is very guarded" space? That would require extending the guard space a bit further which isn't great...

One alternative is to manage per-goroutine space for this, but that's not great either, since its more memory used per goroutine.

cherrymui commented 4 years ago

Good point. This is especially true for architectures that have a lot registers. Maybe we could decide that some registers are never live across a call?

aclements commented 4 years ago

That's an excellent point. A few possible solutions come to mind:

  1. It would be really sad if we had to make more per-goroutine space for this since you almost always have space on the stack. One possibility is to spill to the stack if there's room, and otherwise we keep a pool of allocated "spill space" objects. For that, we could keep just one pre-allocated per P and if the goroutine needs it, it can pull it off the P, spill into it, and then once it's entered the runtime it can allocate a new one (probably getting it from a global allocation pool) and attach it to the P. On return, it can return it to the allocation pool.

  2. A similar option would be to have just one spill space per P. Again, use the stack if you can, otherwise spill into the P, enter the runtime, grow the stack even if it's just a preemption, and then dump the spilled registers back onto the stack in the right place.

  3. We could revisit the nosplit rules. It's often (though not always) a mistake to call from a nosplit function to a non-nosplit function. I'm not sure exactly what this would look like. Maybe a non-nosplit function called from a nosplit function has to have a special prologue? Related: https://github.com/golang/go/issues/21314#issuecomment-346480985. I think the only cases I enumerated in that comment that intentionally call from a nosplit to a non-nosplit function aren't running on a user G stack and thus can't grow it anyway.

gopherbot commented 4 years ago

Change https://golang.org/cl/252258 mentions this issue: cmd/asm: define a macro for GOEXPERIMENT=regabi

gopherbot commented 4 years ago

Change https://golang.org/cl/252257 mentions this issue: cmd/internal/objabi: add regabi GOEXPERIMENT

gopherbot commented 4 years ago

Change https://golang.org/cl/258938 mentions this issue: runtime,cmd/cgo: simplify C -> Go call path

aclements commented 4 years ago

@alexbrainman , I just ran into a bit of a snag on Windows that maybe you can better inform me on. It looks like syscall.NewCallback is going to also need an implementation of the Go internal ABI rules and I'm trying to figure out exactly what that needs to look like. (I pinged you because it looks like you last substantially touched that code, but feel free to pass the ping.)

The documentation says "The argument is expected to be a function with one uintptr-sized result. The function must not have arguments with size larger than the size of uintptr." Does this actually mean the arguments (and result) must be an integral, floating-point, or pointer type? Or does it mean, say, struct { x, y uint16 } or struct { x [4]byte } is allowed, since those are technically no larger than the size of a uintptr? The current implementation permits the latter, though I don't know if it actually works.

If they must be int/float/pointer, then mapping from the Windows ABI to the Go register ABI is relatively straightforward. If compounds are allowed, then this gets much harder.

networkimprov commented 4 years ago

cc @jstarks re the ABI question.

alexbrainman commented 4 years ago

@aclements I reckon there are very few instances of syscall.NewCallback usage. But they are used a lot.

If you want to build Windows GUI, you have to use syscall.NewCallback, because Windows GUI API requires syscall.NewCallback.

Similarly, if you want to build Windows service, you have to use syscall.NewCallback.

These two pretty much cover syscall.NewCallback usage.

For Windows service, you can look in golang.org/x/sys/windows/svc. It uses this RegisterServiceCtrlHandlerEx WIndows API

https://docs.microsoft.com/en-us/windows/win32/api/winsvc/nf-winsvc-registerservicectrlhandlerexw

which requires LPHANDLER_FUNCTION_EX

https://docs.microsoft.com/en-us/windows/win32/api/winsvc/nc-winsvc-lphandler_function_ex

so LPHANDLER_FUNCTION_EX returns DWORD (uint32).

For GUI grep in golang.org/x/exp/shiny - it uses RegisterClass

https://docs.microsoft.com/en-us/windows/win32/api/winuser/nf-winuser-registerclassw

which uses WNDCLASSW.lpfnWndProc

https://docs.microsoft.com/en-us/windows/win32/api/winuser/ns-winuser-wndclassw

which is WNDPROC

https://docs.microsoft.com/en-us/previous-versions/windows/desktop/legacy/ms633573(v=vs.85)

which returns LRESULT which is LONG_PTR or uintptr (in Go)

https://docs.microsoft.com/en-us/windows/win32/winprog/windows-data-types

Also I recommend you look at github.com/lxn/walk for their syscall.NewCallback usage. This is an ultimate package to create GUI apps. But, I doubt, you will find different usage than golang.org/x/exp/shine. (I did not look myself)

I reckon, that is about it. For Microsoft APIs (maybe google code for syscall.NewCallback or windows.NewCallback).

But some people might use it in their custom / proprietary interface. I am not sure how these are supported by Go. You will be the judge.

Alex

CAFxX commented 4 years ago

IIUC this example in the microsoft docs shows that compounds are allowed:

struct Struct2 {
   int j, k;    // Struct2 fits in 64 bits, and meets requirements for return by value.
};
Struct2 func4(int a, double b, int c, float d);
// Caller passes a in RCX, b in XMM1, c in R8, and d in XMM3;
// callee returns Struct2 result by value in RAX.
gopherbot commented 4 years ago

Change https://golang.org/cl/262197 mentions this issue: go/analysis/passes/asmdecl: add support for ABI selector clauses

gopherbot commented 4 years ago

Change https://golang.org/cl/262319 mentions this issue: reflect,runtime: use internal ABI for selected ASM routines

gopherbot commented 4 years ago

Change https://golang.org/cl/262317 mentions this issue: cmd/dist,cmd/go: broaden use of asm macro GOEXPERIMENT_REGABI

gopherbot commented 4 years ago

Change https://golang.org/cl/259445 mentions this issue: cmd/compile,cmd/link: initial support for ABI wrappers

gopherbot commented 4 years ago

Change https://golang.org/cl/260477 mentions this issue: cmd/asm: allow def/ref of func ABI when compiling runtime

gopherbot commented 4 years ago

Change https://golang.org/cl/262318 mentions this issue: cmd: go get golang.org/x/tools@d1624618 && go mod vendor

gopherbot commented 4 years ago

Change https://golang.org/cl/262117 mentions this issue: cmd/compile: delay expansion of OpArg until expand_calls

gopherbot commented 4 years ago

Change https://golang.org/cl/249458 mentions this issue: cmd/compile: avoid generating CSEs; do all aggregates; maintain debug names

gopherbot commented 4 years ago

Change https://golang.org/cl/263271 mentions this issue: runtime: fix sub-uintptr-sized Windows callback arguments

aclements commented 4 years ago

Just an update on this: we were hoping to land this for 1.16, but that was always going to be a bit of a stretch, and at this point we won't be able to make 1.16. Since we're making good progress and have a lot of work in flight, the current plan is to branch when the freeze comes and finish up the work there. Landing this at the beginning of 1.17 isn't the worst thing since that will give it lots of soak time. There's some concern about the risk of landing both this and generics in 1.17, but I believe they will be fairly orthogonal and the nature of the register ABI work is such that we should be able to test it pretty thoroughly and things are likely to fail loudly if there are bugs.

aarzilli commented 4 years ago

There's some concern about the risk of landing both this and generics in 1.17

1.17 -> 2.0?

rsc commented 4 years ago

It sounds like everyone involved is on board for doing this in Go 1.17. Adding to proposal minutes for wider visiblity, but seems headed for likely accept.

mewmew commented 4 years ago

An incomplete and evolving list of tasks:

@aclements Consider adding a bullet point tracking the implementation of DWARF locations for arguments and results (contained in registers). Having good support for Go calling conventions in GDB and Delve is well worth a tracking bullet point :)

This work is already outlined in https://go.googlesource.com/proposal/+/master/design/40724-register-calling.md

DWARF locations: The compiler will need to generate DWARF location lists for arguments and results. It already has this ability for local variables, and we should reuse that as much as possible. We will need to ensure Delve and GDB are compatible with this. Both already support location lists in general, so this is unlikely to require much (if any) work in these debuggers.

aclements commented 4 years ago

@mewmew , thanks, I've added some bullets to the list for this. Hopefully there's nothing to actually do here, but you're right it's good to track that we need to make sure this works.

gopherbot commented 4 years ago

Change https://golang.org/cl/258137 mentions this issue: reflect,runtime: support amd64 register ABI for reflect.Call

gopherbot commented 4 years ago

Change https://golang.org/cl/264437 mentions this issue: internal/abi: add new internal/abi package for ABI constants

gopherbot commented 4 years ago

Change https://golang.org/cl/264438 mentions this issue: runtime: support register ABI for finalizers

rsc commented 3 years ago

Based on the discussion above, this seems like a likely accept.

gopherbot commented 3 years ago

Change https://golang.org/cl/266638 mentions this issue: reflect,runtime: use internal ABI for selected ASM routines, attempt 2

rsc commented 3 years ago

No change in consensus, so accepted.

gopherbot commented 3 years ago

Change https://golang.org/cl/270077 mentions this issue: runtime: simplify C -> Go callbacks on windows/arm

gopherbot commented 3 years ago

Change https://golang.org/cl/270863 mentions this issue: [dev.regabi] cmd/compile,cmd/link: initial support for ABI wrappers

gopherbot commented 3 years ago

Change https://golang.org/cl/271178 mentions this issue: runtime: support new callbackasm1 calling convention on windows/arm

gopherbot commented 3 years ago

Change https://golang.org/cl/272570 mentions this issue: runtime: support register ABI for finalizers

gopherbot commented 3 years ago

Change https://golang.org/cl/272567 mentions this issue: internal/abi: add new internal/abi package for ABI constants

gopherbot commented 3 years ago

Change https://golang.org/cl/272568 mentions this issue: reflect,runtime: support amd64 register ABI for reflect.Call

gopherbot commented 3 years ago

Change https://golang.org/cl/279514 mentions this issue: dashboard: add linux-amd64-regabi builder

gopherbot commented 3 years ago

Change https://golang.org/cl/279515 mentions this issue: dashboard: add linux-amd64-regabi builder

grantseltzer commented 3 years ago

Just want to confirm - this will be feature flagged in go 1.16?

ALTree commented 3 years ago

@grantseltzer No, this is scheduled for Go 1.17, it won't be in 1.16.

aclements commented 3 years ago

It was originally planned for 1.16, but that was always going to be tight and we missed the deadline, so now it's for 1.17. I'll update the top post to say 1.17.

gopherbot commented 3 years ago

Change https://golang.org/cl/279673 mentions this issue: dashboard: add linux-amd64-regabi builder

grantseltzer commented 3 years ago

Got it thanks for update. Regardless, sounds like it'll be feature flagged.

gopherbot commented 3 years ago

Change https://golang.org/cl/285292 mentions this issue: [dev.regabi] doc: add internal ABI specification

aclements commented 3 years ago

Re @mknyszek's comment:

A comment on the "stack growth" section of the doc, specifically about spilling register state for morestack into the guard space for a goroutine (copied from Gerrit):

@thanm had a good idea about eliminating the spill space we're currently reserving in the argument frame: we could charge the necessary spill space to the caller even if it doesn't reserve it. Let's say F calls G. F's prologue would check that there's enough stack space for F's frame as well as G's spill space, even though F would only open its own frame, which wouldn't include G's spill space. When F calls G, if G doesn't have enough space to open its frame (plus the spill space of anything it calls), it would enter the morestack path, and that could safely grow the stack enough to spill G's argument registers because F ensured there would be enough space.

We would have to make sure stack shrinking didn't shrink away this spill space, but it could make conservative assumptions about how much space must be reserved.

I'm not sure how this interacts with whether we have a compiler-generated spill path around the call to morestack or a generic spill path in morestack. It would be nice if the spill space was only as large as necessary for a function, since the maximal spill space is fairly large compared to the typical spill space. It's easy to have a compact spill space if the compiler generates the spill path. If we want a generic spill path with a compact spill space, I think we'd need morestack to switch to g0 using only scratch registers, temporarily spill everything there, lookup the function metadata, and copy the actual argument registers back to the g stack. That's pretty messy. I think we'll want to see how much code space actually goes to the compiler-generated spill paths.