Open bcmills opened 2 years ago
The extra complexity in cmd/go also leads to subtle bugs like #48490.
I wonder whether this is also a bug https://github.com/golang/go/issues/48496
Ideally cmd/compile should choose defaults based on GOMAXPROCS rather than a hard-coded constant, since not all packages have uniform shape (especially machine-generated API bindings!) and not all users' workstations have the same number of hardware threads.
But cmd/go also runs multiple cmd/compile processes concurrently, and we don't want them stepping on each others' toes, right? Is there a good way to have them cooperate?
Or is the idea that cmd/go would set GOMAXPROCS for each subprocess based on what share of CPUs it should utilize?
But cmd/go also runs multiple cmd/compile processes concurrently, and we don't want them stepping on each others' toes, right?
Maybe? But that's arguably kind of the kernel's job anyway — why are we doing its job for it? 😅
Is there a good way to have them cooperate?
Not sure. In theory we could have some kind of cross-process semaphore, but in practice... that's what the kernel's scheduler is supposed to be doing.
I would expect that ultimately the limiting factor on compilation concurrency would be RAM for parse trees and other per-file data structures, which really don't depend on the number of active threads — but perhaps my expectations are off somewhere.
The conclusion in CL 41192 was:
Given the increased user-CPU costs as c increases, it looks like c=4 is probably the sweet spot, at least for now.
However, as far as I can tell the analysis there mostly shows that increasing concurrency wouldn't help for the packages in std
and cmd
— I don't see anything in those charts that quantifies how much it would actually cost to go higher. (Presumably the “increased user-CPU costs” to which @josharian referred are mainly the costs of creating, scheduling, and destroying the additional threads, but with modern thread implementations those costs should be fairly low to begin with.)
A few minor observations.
go tool compile
to have -c=1
, because it means that I can add debugging code freely without having to think about concurrency.c
over many different packages on a 96 core machine and found almost no benefit to values of c
over 4. However, that was a long time ago, and newer hardware may have changed this.Maybe? But that's arguably kind of the kernel's job anyway — why are we doing its job for it? 😅
Because not all of them do their job well, and it's easy for us to control concurrent access. Same answer as for why a lot of tools have semaphores guarding I/O and other syscalls.
If cmd/compile detects that other gcflags are incompatible with concurrent compilation, it should drop to sequential compilation on its own.
This seems eminently sensible, and would provide a lot of simplification. +1
Maybe? But that's arguably kind of the kernel's job anyway — why are we doing its job for it? sweat_smile
My impression was that you get the best throughput when the number worker threads is close to the number of CPU cores, and increasing the number of workers past that leads to cache thrashing. E.g., if we already have N cmd/compile processes running across N cores, it doesn't help for the cmd/compile processes to internally switch back and forth between multiple functions; it just reduces memory locality.
My impression was that you get the best throughput when the number worker threads is close to the number of CPU cores, and increasing the number of workers past that leads to cache thrashing.
That matches my intuition, but if the kernel's time-slice granularity is big enough then it shouldn't matter much either way. (For comparison, Go's garbage collector also thrashes the cache, but GCs are infrequent enough that it's not that big a deal.)
As a compiler developer, it's really nice for a plain invocation of go tool compile to have
-c=1
, because it means that I can add debugging code freely without having to think about concurrency.
Maybe? But if we move the choice to the cmd/compile
side, perhaps you could instead set something in your environment (like GO19CONCURRENTCOMPILATION=0
or GODEBUG=concurrentcompile=0
) to change the default for go tool compile
.
(If you invoke the compiler through go/build
, you can already set GOFLAGS=-gcflags=-c=1
.)
Something I wonder is the "critical path". If the toolchain is able to build 10 packages in parallel, because we have enough CPUs at hand, but one of those packages should be given priority because it is imported by another dozen of packages we also need to build - will it actually be given more CPU time earlier than the other 9?
Maybe? But if we move the choice to the cmd/compile side, ...
This is something I feel strongly about. I don't want to have to remember to set up a special env var on new machines, and I don't want to have to explain to new contributors about it, or debug what happens when they haven't. Plain go tool compile
is currently a fairly friendly tool, and I'd like to keep it that way.
@mvdan we don't currently have any critical path analysis, but that's a good point—cmd/go is the tool that sees the big picture, and is thus better placed to make resource allocation decisions.
@kunaltyagi This discussion is about the Go tool invoking the compiler, so, yes, I think the idea of adding a GNU make jobserver implementation should be in a different issue. Thanks.
Got it, thanks.
In reviewing CL 351049, my attention was drawn to a complicated function that
cmd/go
uses to decide whether to requestcmd/compile
to use internal concurrency: https://github.com/golang/go/blob/b1bedc0774d8a3a7ff8778e933ee92e8638e9493/src/cmd/go/internal/work/gc.go#L213-L285I asked the compiler team about it, and @rsc pointed out a parallel function in
cmd/compile
, which ultimately terminates the process if the flags are not compatible: https://github.com/golang/go/blob/c7f2f51fed15b410dea5f608420858b401887d0a/src/cmd/compile/internal/base/flag.go#L329-L364Those two functions implement slightly different logic for when concurrent compilation is allowed, and I expect that they will only tend to drift more over time. Moreover, having the compiler reject values of
-N
greater than 0 when certain flags are in use makes it harder for users to overridecmd/go
s current hard-coded default for the concurrency limit (which may be a factor in #17751).The extra complexity in
cmd/go
also leads to subtle bugs like #48490.It's obviously reasonable for the compiler to use internal synchronization to avoid races and to produce deterministic output, and at the limit “dropping the concurrency to 1” is equivalent to “using internal synchronization”. So, I think:
cmd/go
should avoid setting the-c
flag explicitly at all, and instead letcmd/compile
choose its own reasonable defaults.If
cmd/compile
detects that othergcflags
are incompatible with concurrent compilation, it should drop to sequential compilation on its own. (That is, the-c
flag should specify an upper limit, not an exact one.) That way, users can setGOFLAGS=-gcflags=-c=N
and not worry about what other flags they might end up needing to pass.Ideally
cmd/compile
should choose defaults based onGOMAXPROCS
rather than a hard-coded constant, since not all packages have uniform shape (especially machine-generated API bindings!) and not all users' workstations have the same number of hardware threads.CC @josharian @randall77 @mvdan