golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
122.88k stars 17.52k forks source link

cmd/go,cmd/compile: let cmd/compile choose how much concurrency to use #48497

Open bcmills opened 2 years ago

bcmills commented 2 years ago

In reviewing CL 351049, my attention was drawn to a complicated function that cmd/go uses to decide whether to request cmd/compile to use internal concurrency: https://github.com/golang/go/blob/b1bedc0774d8a3a7ff8778e933ee92e8638e9493/src/cmd/go/internal/work/gc.go#L213-L285

I asked the compiler team about it, and @rsc pointed out a parallel function in cmd/compile, which ultimately terminates the process if the flags are not compatible: https://github.com/golang/go/blob/c7f2f51fed15b410dea5f608420858b401887d0a/src/cmd/compile/internal/base/flag.go#L329-L364

Those two functions implement slightly different logic for when concurrent compilation is allowed, and I expect that they will only tend to drift more over time. Moreover, having the compiler reject values of -N greater than 0 when certain flags are in use makes it harder for users to override cmd/gos current hard-coded default for the concurrency limit (which may be a factor in #17751).

The extra complexity in cmd/go also leads to subtle bugs like #48490.

It's obviously reasonable for the compiler to use internal synchronization to avoid races and to produce deterministic output, and at the limit “dropping the concurrency to 1” is equivalent to “using internal synchronization”. So, I think:

CC @josharian @randall77 @mvdan

cuonglm commented 2 years ago

The extra complexity in cmd/go also leads to subtle bugs like #48490.

I wonder whether this is also a bug https://github.com/golang/go/issues/48496

mdempsky commented 2 years ago

Ideally cmd/compile should choose defaults based on GOMAXPROCS rather than a hard-coded constant, since not all packages have uniform shape (especially machine-generated API bindings!) and not all users' workstations have the same number of hardware threads.

But cmd/go also runs multiple cmd/compile processes concurrently, and we don't want them stepping on each others' toes, right? Is there a good way to have them cooperate?

Or is the idea that cmd/go would set GOMAXPROCS for each subprocess based on what share of CPUs it should utilize?

bcmills commented 2 years ago

But cmd/go also runs multiple cmd/compile processes concurrently, and we don't want them stepping on each others' toes, right?

Maybe? But that's arguably kind of the kernel's job anyway — why are we doing its job for it? 😅

Is there a good way to have them cooperate?

Not sure. In theory we could have some kind of cross-process semaphore, but in practice... that's what the kernel's scheduler is supposed to be doing.

I would expect that ultimately the limiting factor on compilation concurrency would be RAM for parse trees and other per-file data structures, which really don't depend on the number of active threads — but perhaps my expectations are off somewhere.

The conclusion in CL 41192 was:

Given the increased user-CPU costs as c increases, it looks like c=4 is probably the sweet spot, at least for now.

However, as far as I can tell the analysis there mostly shows that increasing concurrency wouldn't help for the packages in std and cmd — I don't see anything in those charts that quantifies how much it would actually cost to go higher. (Presumably the “increased user-CPU costs” to which @josharian referred are mainly the costs of creating, scheduling, and destroying the additional threads, but with modern thread implementations those costs should be fairly low to begin with.)

josharian commented 2 years ago

A few minor observations.

Maybe? But that's arguably kind of the kernel's job anyway — why are we doing its job for it? 😅

Because not all of them do their job well, and it's easy for us to control concurrent access. Same answer as for why a lot of tools have semaphores guarding I/O and other syscalls.

If cmd/compile detects that other gcflags are incompatible with concurrent compilation, it should drop to sequential compilation on its own.

This seems eminently sensible, and would provide a lot of simplification. +1

mdempsky commented 2 years ago

Maybe? But that's arguably kind of the kernel's job anyway — why are we doing its job for it? sweat_smile

My impression was that you get the best throughput when the number worker threads is close to the number of CPU cores, and increasing the number of workers past that leads to cache thrashing. E.g., if we already have N cmd/compile processes running across N cores, it doesn't help for the cmd/compile processes to internally switch back and forth between multiple functions; it just reduces memory locality.

bcmills commented 2 years ago

My impression was that you get the best throughput when the number worker threads is close to the number of CPU cores, and increasing the number of workers past that leads to cache thrashing.

That matches my intuition, but if the kernel's time-slice granularity is big enough then it shouldn't matter much either way. (For comparison, Go's garbage collector also thrashes the cache, but GCs are infrequent enough that it's not that big a deal.)

bcmills commented 2 years ago

As a compiler developer, it's really nice for a plain invocation of go tool compile to have -c=1, because it means that I can add debugging code freely without having to think about concurrency.

Maybe? But if we move the choice to the cmd/compile side, perhaps you could instead set something in your environment (like GO19CONCURRENTCOMPILATION=0 or GODEBUG=concurrentcompile=0) to change the default for go tool compile.

(If you invoke the compiler through go/build, you can already set GOFLAGS=-gcflags=-c=1.)

mvdan commented 2 years ago

Something I wonder is the "critical path". If the toolchain is able to build 10 packages in parallel, because we have enough CPUs at hand, but one of those packages should be given priority because it is imported by another dozen of packages we also need to build - will it actually be given more CPU time earlier than the other 9?

josharian commented 2 years ago

Maybe? But if we move the choice to the cmd/compile side, ...

This is something I feel strongly about. I don't want to have to remember to set up a special env var on new machines, and I don't want to have to explain to new contributors about it, or debug what happens when they haven't. Plain go tool compile is currently a fairly friendly tool, and I'd like to keep it that way.

josharian commented 2 years ago

@mvdan we don't currently have any critical path analysis, but that's a good point—cmd/go is the tool that sees the big picture, and is thus better placed to make resource allocation decisions.

kunaltyagi commented 2 years ago
EDIT: Not relevant to discussion Would there be a GNU make jobserver compatible implementation? That would allow polyglot code-bases to be compiled without stepping on many toes. This can be an advanced option, not required for 80% of usecases. Currently, the tooling for C/C++ and rust already supports it, but golang doesn't, making the build scripts slightly sequential (not a big deal right now). I assume this would also open doors for some very neat integration with on-prem CI (eg: high utilization without page thrashing due to context switches when total processes > number of cores (difficult to control right now)) PS: am I going about it in the wrong thread? I think this is tangentially related to the topic at hand
ianlancetaylor commented 2 years ago

@kunaltyagi This discussion is about the Go tool invoking the compiler, so, yes, I think the idea of adding a GNU make jobserver implementation should be in a different issue. Thanks.

kunaltyagi commented 2 years ago

Got it, thanks.

I've created a separate issue, and hidden the previous comment to prevent derailing of further conversation https://github.com/golang/go/issues/52387