cmd/compile, runtime: add and use go:nosplitrec compilation directive

ianlancetaylor commented 7 years ago

A key error leading to #21306 was a call from a nosplit function to a function not marked nosplit, permitting preemption at a point where it was not permitted.

Therefore, just as we have go:nowritebarrierrec meaning that such a function may not call any function that permits write barriers, we should have go:nosplitrec meaning that such a function may not call any function that permits preemption.

Since nosplit is now a misnomer--the stack no longer splits--we could also consider a name like go:nopreemptrec.

cherrymui commented 7 years ago

Shouldgo:nosplit actually mean recursively nosplit? I think usually we want it doesn't grow stack or be preempted in the whole duration of the function. When would we want a particular function to be nosplit but it can call other functions that are not nosplit?

ianlancetaylor commented 7 years ago

A function like sigtrampgo is go:nosplit because it is called by the signal handler and g may be nil, but sigtrampgo sets up g and once that is done it is fine to call non-nosplit functions like sighandler.

Of course, we could change go:nosplit to default to being recursive and add a go:yessplit annotation, along the lines of go:yeswritebarrierrec.

aclements commented 7 years ago

Since nosplit is now a misnomer--the stack no longer splits--we could also consider a name like go:nopreemptrec.

The stack no longer splits, but there are places where we use this to mean "no stack growth", not "no preemption". Both currently have the same consequence, but maybe now would be a good time to tease them apart?

Of course, we could change go:nosplit to default to being recursive and add a go:yessplit annotation, along the lines of go:yeswritebarrierrec.

I like this idea in general, but it may be a bit of a pain. Maybe it would make sense to combine making it recursive by default with renaming it?

aclements commented 6 years ago

I've been looking through various uses of nosplit and there seem to be four distinct reasons for it. While we're pondering introducing a new annotation, I believe it's worth considering splitting out these different reasons, even if they have the same or similar consequences at the moment:

We're simply in a bad state for stack growth/scheduling. Ian's sigtrampgo example falls into this category (though I'm not convinced that's real; isn't it always called on the signal stack?). Also functions like mstart and needm, though again I think these may always be called on the g0 or gsignal stacks anyway. In this case, these functions may put the system into a state that's okay for growth/scheduling and then call regular functions, so this is where a non-recursive go:nosplit is still useful.
There are slots on the stack with unknown type, so we can't safely move or walk the stack. For example, everything related to entersyscall and exitsyscall, newdefer and deferreturn, as well as a lot of things related to the buffered write barrier fall in this category. This must be recursive and there's probably never any need to opt out of it. Interestingly, this is still about both stack moving and preemption because it's not safe to preempt and scan the stack. For this, I propose a new annotation called go:noscan (since both stack moving and preemption are about scanning) that applies recursively and prevents stack moving and preemption. In gc, this would drop the stack check prologue and could also affect loop preemption/etc. In toolchains that use segmented stacks like gccgo, I believe it is safe to create new stack segments in these situations, so it could block only preemption.
The function must not be preempted. For example, dolockOSThread and related functions muck with scheduler state and would break if their G moved. Similarly, there are functions like typedmemmove that must prevent preemption to block GC phase transitions. In these cases, it's actually okay to move the stack, but it's not okay to preempt them. I believe this should always apply recursively. For this, I propose a go:nosched annotation.
The function doesn't care, but it's a helper function to one of the above. I believe this is actually the vast majority of nosplit functions, which is unfortunate because there's rarely a clear reason why they're nosplit, making them very hard to maintain. For this, I propose that the compiler should automatically propagate the recursive annotations above as much as it can. The linker would still check everything in the end, so it would still be necessary to manually propagate across assembly and package boundaries, but this is quite rare.

ianlancetaylor commented 6 years ago

sigtrampgo is always called on the gsignal stack, but it is called directly from the signal handler, so the global g will point to the goroutine that was running when the signal was delivered. The stack check prologue will compare the stack pointer on the gsignal stack to g's goroutine stack, and do something horrible. We need to disable the stack check prologue for that function.

gopherbot commented 6 years ago

Change https://golang.org/cl/79615 mentions this issue: runtime: document sigtrampgo better

gopherbot commented 6 years ago

Change https://golang.org/cl/79815 mentions this issue: runtime: fix final stack split in exitsyscall

cherrymui commented 6 years ago

I assume the change won't happen in Go 1.10. @aclements has a plan with a few new directives.

aclements commented 6 years ago

Yeah, not happening for 1.10. I might try to prototype this before the tree opens, though.

Some thoughts from discussing these annotations with others:

It's possible we should rename go:nosplit to, say, go:nostackcheck. However, go:nosplit is publicly documented (go doc cmd/compile) and is used occasionally in code outside std. We could, of course, accept both and change it in the runtime, but it's not clear to me that's worthwhile.
It's possible we should have go:gcatomic instead of go:nosched. That's certainly more specific in many cases that need to block preemption in order to block GC phase transitions. On the other hand, it doesn't cover a handful of uses like dolockOSThread that specifically want to block G/M/P migration. Maybe those should just use go:nosplit or possibly acquirem/releasem. (/cc @RLH for thoughts)

There are also two more, more obscure uses of go:nosplit: 1) as an optimization in really hot functions, though I wasn't able to find any examples of this, and 2) on functions that call getcallerpc. The latter is for historical reasons, from when we actually split stacks and didn't want to sometimes get the PC of the "return from split" function. @ianlancetaylor, does gccgo still care about this? Even if it does, presumably the compiler should just infer this from the presence of a call to getcallerpc.

ianlancetaylor commented 6 years ago

gccgo never required any special handling for getcallerpc. It is implemented by a compiler intrinsic that simply fetches the return address (the GCC intrinsic __builtin_return_address).

cherrymui commented 6 years ago

I think @aclements's question is that we may want getcallerpc to return the PC of the actual caller, instead of __morestack (or whatever the trampoline is) if the function calling getcallerpc splits stack. Fetching the return address from the stack probably returns the latter.

ianlancetaylor commented 6 years ago

@cherrymui Ah, got it, thanks. Yes, for that to work right, in gccgo, any function that calls getcallerpc ought to be marked nosplit. (Which is not actually true today.)

aclements commented 6 years ago

Thanks. Could gccgo instead automatically mark any function that calls getcallerpc as nosplit? (Or getcallerpc could handle the trampoline return PC, like how it used to handle the stack barrier return PC.)

ianlancetaylor commented 6 years ago

gccgo could certainly mark any function that calls getcallerpc as nosplit, although it would be risky as gccgo has no test that prevents stack overflow of nosplit functions; it just relies on gc's test for that plus the fact that gccgo has a larger red zone. Adding that test would be difficult as there is no good place to put it, since gccgo is just a frontend that has no knowledge of stack sizes, and it uses an unmodified system linker.

Having getcallerpc handle the trampoline return PC would drastically affect its efficiency. In gccgo the only way to do that is to, in effect, turn getcallerpc into runtime.Callers. That would make it necessary to go through the gccgo version of libgo and remove the calls to getcallerpc that are only there for the race detector. Which could of course be done.

aclements commented 6 years ago

Having getcallerpc handle the trampoline return PC would drastically affect its efficiency. In gccgo the only way to do that is to, in effect, turn getcallerpc into runtime.Callers.

Oh dear. The stack trampoline doesn't save its own return PC anywhere convenient?

ianlancetaylor commented 6 years ago

gccgo lives in a more complex world than gc. The stack splitting code is not part of the Go frontend, it's part of GCC itself and the GCC support libraries. The Go library has no access to the data structures used by the GCC support libraries.

It is also true that the stack trampoline doesn't save the return PC anywhere in particular. It just saves the stack pointer. Quoting gcc/libgcc/config/i386/morestack.S:

# We do a little dance so that the processor's call/return return
# address prediction works out.  The compiler arranges for the caller
# to look like this:
#   call __generic_morestack
#   ret
#  L:
#   // carry on with function
# After we allocate more stack, we call L, which is in our caller.
# When that returns (to the predicted instruction), we release the
# stack segment and reset the stack pointer.  We then return to the
# predicted instruction, namely the ret instruction immediately after
# the call to __generic_morestack.  That then returns to the caller of
# the original caller.

gopherbot commented 6 years ago

Change https://golang.org/cl/83015 mentions this issue: runtime: mark heapBits.bits nosplit

aclements commented 6 years ago

Another instance where this would have helped: https://go-review.googlesource.com/c/go/+/85880/1

aclements commented 6 years ago

@ianlancetaylor, I'm sure it's more complicated than this, but reading over gcc's __morestack for x86-64, since the rsp from the old stack segment gets saved in rbp when __morestack calls L, could getcallerpc avoid all the complexity of actually walking the stack and work roughly like

x = get obvious caller pc (using rbp or frame size or whatever)
if x is not in __morestack
    return x
return *(saved rbp + 16)

I think the stack around the saved rbp looks like


f's return PC
morestack's return PC (to f)
morestack's saved rbp         <-- f's saved rbp

ianlancetaylor commented 6 years ago

@aclements The whole world is not an x86. Also, libgcc is not part of the gofrontend. I'd rather avoid having gofrontend depend on implementation details of libgcc.

I don't think it matters, though. I looked through gccgo's version of the runtime package, and, remembering that gccgo does not support the race detector, there was only one meaningful call to getcallerpc, from newosproc. It would be easy to make just that function nosplit, or, since it is called only from compiler-generated code, change the way it is called.

aclements commented 6 years ago

I'd rather avoid having gofrontend depend on implementation details of libgcc.

Sure. I hadn't thought that it would. I assumed (perhaps incorrectly) that getcallerpc would be lowered to __builtin_return_address and that __builtin_return_address would potentially have to deal with this issue even in non-Go code that used -fsplit-stack.

I don't think it matters, though. ... It would be easy to make just that function nosplit, or, since it is called only from compiler-generated code, change the way it is called.

Ah, great. That sounds like an easy solution. :)

aclements commented 6 years ago

I've been experimenting with my proposed go:noscan annotation and I'm increasingly thinking that the recursive effect is a mistake, but it could have a local effect combined with recursive verification.

The problem with automatically propagating it recursively is it can easily worm its way into lots of surprising parts of the runtime. Most of its surprise reach is easy to fix, but fixing it requires noticing it, and if the propagation is automatic, it's hard to notice.

But I also want to fix the problem of nosplit being added to helpers and then being hard to remove because you don't know why it was added.

So my alternate proposal is: Add both a go:noscan annotation and a go:noscancalled annotation. Both have the effect of disabling stack scanning for only the annotated function (meaning no split check in gc). The linker checks that functions are go:noscancalled if and only if they are reachable from a go:noscan function. This way, the effect of these annotations is local and maintained directly in the code (and hence more predictable), but the verification is still recursive. By making the check strict, we avoid keeping around unneeded annotations (much like our strict import and variable use checks).

ianlancetaylor commented 6 years ago

Are you suggesting any check for an unannotated function called by a go:noscan function? Or will that just be a potential bug?

aclements commented 6 years ago

Are you suggesting any check for an unannotated function called by a go:noscan function?

I am. I'm suggesting that the linker should check that all of the functions called by a go:noscan function must be go:noscancalled (perhaps or go:nosplit to make the transition easier). I'm also suggesting that the linker check that no additional functions are marked go:noscancalled. This would mean there are no surprises in what gets its stack check prologue removed.

One potential downside is that different build configurations may have somewhat different sets of go:noscancalled functions. I'm not sure how to deal with that.

Another possibility would be propagate the go:noscan annotation automatically (no go:noscancalled annotation), but have diagnostic output to report exactly which functions were automatically marked. We could check that in so there's an obvious diff in that file when something changes.

aclements commented 6 years ago

Alternatively, maybe doing this all purely statically is just the wrong approach. In most uses, it's fine if these helper functions grow the stack, so it's a little unfortunate that just because one of their callers is noscan, they lose the ability to grow that stack for all uses.

A possible hybrid static/dynamic approach would be for a noscan function to set a flag on entry (and clear it on exit) that inhibits stack growth even if the stack check fails and for the linker to check that the call tree does in fact fit in the nosplit space, so ignoring the stack check is okay.

This would have to be integrated with the linker's nosplit check because if a nosplit function calls a noscan function, the caller (and any further nosplit functions above it) have to count against the space. Things also get dicey with indirect calls, though that's already dicey (and as it turns out we don't make any problematic indirect calls in noscan call trees).

This approach wouldn't involve any recursive effect on generated code and would let us remove most of the nosplits from the runtime, while still letting us statically ensure the safety of this code.

bcmills commented 6 years ago

In most uses, it's fine if these helper functions grow the stack, so it's a little unfortunate that just because one of their callers is noscan, they lose the ability to grow that stack for all uses.

Another way to think of this is that the functions must have a static upper bound on stack size, even if the caller does not guarantee that the available stack space exceeds that bound.

(Perhaps we could name that go:nonrecursive or go:stacklimit instead of go:nosplit?)

Are there cases where we care about a static upper bound but that bound is parametric (say, with the stack bound of a passed-in function)?

aclements commented 6 years ago

Another way to think of this is that the functions must have a static upper bound on stack size, even if the caller does not guarantee that the available stack space exceeds that bound.

I think this is a valid generalization, but would be difficult to actually implement. Currently, splittable functions ensure there's enough available space for any chain of nosplit functions, and we can use that same global bound to ensure that noscan call trees always fit on the stack.

You could imagine that the prologue of a noscan function could grow the stack to whatever bound was required by its call tree (just use this size instead of its own frame size for the growth prologue). But there are some complications: 1. the prologue is written by the compiler, but the stack bound analysis needs to be done by the linker since it may cross assembly functions or package boundaries, and 2. for at least syscalls and and the write barrier, by the time we reach the noscan function we already can't grow the stack (for syscalls, because of the arguments to the syscall, and for the write barrier because its arguments are passed in registers).

Are there cases where we care about a static upper bound but that bound is parametric (say, with the stack bound of a passed-in function)?

Not for the functions I'm planning to mark noscan. They don't make any indirect function calls (at least, not without first switching to the system stack).

eliasnaur commented 6 years ago

Two cases where I believe a go:nosplitrec or go:noscan would have helped:

CL 21314 would have missed a go:nosplit for the runtime.crash function if Austin had not caught my mistake in review.

The breaking of the nosplit chain CL 111258 is fixing. A go:noscan would presumably have caught the subtle issue that the closure in systemstack(func() { ... }) is not marked nosplit.

aclements commented 6 years ago

Another example: CL 131277 fixed a missing nosplit. This is a case of "bad state for stack growth/scheduling" since the runtime may not be initialized when entering this function via libpreinit.

wsc1 commented 5 years ago

Since nosplit is now a misnomer--the stack no longer splits--we could also consider a name like go:nopreemptrec.

The naming and related breakdown of functionality needs to take into account 3rd party user code, as go:nosplit is available to 3rd parties independent of relation to the runtime.

The main reason I can think of for 3rd party code is to make it easier to express scheduling control points in the stack based/cooperative scheduling semantic. If that semantic is indeed going away, then one way to give control back to the go user programmer would be to introduce compiler directive which is independent of runtime internals like whether the stack is split or whether the GC uses a write barrier or some other kind of hypothetical hardware assist mechanism.

I would propose to set aside a directive for the go user, perhaps go:noruntime or, if needed also go:noruntimerec and go:runtimeresume which would have the semantics of go:nosplit* labels but would be classified as intended for consumers of the go runtime and not for implementors.

User directives would then need to be treated as such w.r.t. changes over time: they would be independent of runtime implementation and provide a means for a plain old Go programmer to turn on and off the runtime (and take associated responsibilities such as not calling package runtime functions and working with preallocated memory only, small local variables, ..., but I guess sys calls should be allowed in some form ).

This would enable clean real-time audio code which is user Go code, and may also enable a Go user to interface Go with various other operating system services which give guidelines that risk bad interactions with the runtime.

By classifying a separate directive to the Go user, it would also allow runtime and compiler development to proceed relatively independently of external consumers of go:nosplit and vice versa.

gopherbot commented 3 years ago

Change https://golang.org/cl/276173 mentions this issue: runtime: prevent stack growth after fork in runtime.sigfillset

aclements commented 1 year ago

Another example where nosplitrec would have prevented bugs: CL 431919

thanm commented 1 year ago

Another place where it would have helped described in https://github.com/golang/go/issues/56044.

gopherbot commented 1 year ago

Change https://go.dev/cl/441859 mentions this issue: runtime: mark arenaIdx.l1 and arenaIdx.l2 methods as nosplit

golang / go

cmd/compile, runtime: add and use go:nosplitrec compilation directive #21314