golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
122.99k stars 17.54k forks source link

cmd/compile: use strings.Builder #23828

Closed josharian closed 2 years ago

josharian commented 6 years ago

The main challenge here is ensuring everything still works when built with 1.4.

Probably build tag protected files, with something like

type buffer struct { bytes.Buffer }

for pre-1.10 and

type buffer = strings.Builder

for 1.10+.

mdempsky commented 6 years ago

I've been thinking we should add a "cmd/internal/polyfill" (or however we want to spell the bikeshed) package that provides interfaces to sort.Slice, strings.Builder, and other recent stdlib additions that we want to use in the toolchain, while maintaining Go 1.4 compatibility.

bradfitz commented 6 years ago

@mdempsky, you could call the package just "util". ducks

davecheney commented 6 years ago

Why stop there? “common”, “shared”, or “base” are all perfectly cromulent package names.

On 15 Feb 2018, at 07:09, Brad Fitzpatrick notifications@github.com wrote:

@mdempsky, you could call the package just "util". ducks

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

martisch commented 6 years ago

cmd/internal/compat?

pciet commented 6 years ago

In cmd/compile/internal/gc there's these cases:

I assume there might be cases of “str” + “str2” or fmt.Sprintf that may be more efficient as a strings.Builder.

https://github.com/golang/go/issues/18990 indicates there are no benchmarks for Builder. I see go tool compile -bench that would allow a comparison of compiler things in ns/op with these changes. Without a clear performance win I don't see a reason to do this.

bradfitz commented 6 years ago

In any case, the point behind my snark was to make sure that the package does NOT become a "util" package that itself depends on many things the caller of your package may not want.

It might be valid for the first few things, but it'll inevitably grow, and the util-ness bloat will sneak up on us.

It might be best to have N such packages under some "compat" or "future" directory.

bradfitz commented 6 years ago

@pciet, there are tests for Builder's allocations. Not allocating as much helps CPU, running the GC less. It's hard to capture that in benchmarks sometimes. (or they end up noisy)

pciet commented 6 years ago

cmd/internal/compat/strbuf? Isn't compile the only thing that needs to be built with 1.4 though?

mdempsky commented 6 years ago

@bradfitz

In any case, the point behind my snark was to make sure that the package does NOT become a "util" package that itself depends on many things the caller of your package may not want.

I was imagining a package that's limited to just providing current standard library functionality, with backwards compatible implementations for bootstrapping under older Go releases. That is, for non-bootstrap builds, the packages don't do anything but pass through calls to the standard library.

It seems like you're suggesting that cmd shouldn't treat std as a single monolithic dependency, but instead should view each standard library package dependency individually. I understand why this is important within std, but it's not clear to me that it extends to cmd too.

bradfitz commented 6 years ago

I understand why this is important within std, but it's not clear to me that it extends to cmd too.

I personally don't care too much. I'm mostly echoing the objections I've heard about "util" packages in the past.

Practically, it matters less here because this is an internal package and we can therefore audit the fixed number of users of the "util" package and verify they're not pulling in anything unnecessary via the util package they wouldn't depend on otherwise via other paths.

dgryski commented 6 years ago

At what point to we bite the bullet and say that you need something newer than 1.4 to build the latest Go release and add an extra step to bootstrapping from C?

josharian commented 6 years ago

At what point to we bite the bullet

As late as humanly possible. Go 2, perhaps.

pciet commented 6 years ago

I can work on this.

I don’t like “util” packages but importing a few extra compatibility types isn’t going to be a big deal here, and we have to have a package since compile is split into horizontally dependent subpackages. I’d just add cmd/internal/compat and have a compat.StringBuffer.

My previous comment is coming from ignorance, sorry about that. Obviously at least the linker is also required, and adjusting the bootstrap is outside the scope of this issue. I’ll assume other commands need the compatibility types too.

Again from ignorance, I’m not seeing package testing style benchmarking (with the stability feature) for the compiler. I was thinking I’ll write cmd/compile/bench_test.go that sets up by compiling the gc dependencies then benchmark loops on main() targeting package gc.

I only have an amd64/linux computer to benchmark on though, are there resources for trying things on other platforms? I’ll just email golang-dev asking for benchmarks with the change if not.

josharian commented 6 years ago

Compiler benchmarking:

https://godoc.org/golang.org/x/tools/cmd/compilebench

https://github.com/josharian/compilecmp

josharian commented 6 years ago

Oh, and I assume that the vast majority of compilation occurs on amd64, so I wouldn’t sweat that.

pciet commented 6 years ago

I’m not seeing a significant performance difference with compilebench+benchstat. I’ll try to modify the cases where a Reader or .Bytes() call is used next, so far I've only done the direct replacement cases.

pciet commented 6 years ago

Based on what I'm reading via Google Search I'll leave these small return concat cases alone:

// fmt.go
return "[]" + tmodeString(t.Elem(), mode, depth)

return t.Etype.String() + "-" + typefmt(t, flag, FErr, depth)

return "[" + strconv.FormatInt(t.NumElem(), 10) + "]" + tmodeString(t.Elem(), mode, depth)

[actually these are where there may be wins with strings.Builder since a new allocation wouldn't have to be made to combine the strings]

gopherbot commented 6 years ago

Change https://golang.org/cl/94897 mentions this issue: cmd/compile: use strings.Builder

pciet commented 6 years ago

time+alloc benchmarks: https://github.com/golang/go/issues/16897

pciet commented 6 years ago

I’ve submitted two patch sets that are both off what this change should be but I’ve shown that func tconv in cmd/compile/internal/gc/fmt.go is the only potentially worthwhile place to add in strings.Builder. I’m working on a fresh patch set now [patch set three had better performance but not conclusively positive, and @davecheney says duplicating tconv with a builder implementation isn't maintainable enough which I agree with].

Specifically tconv is the only obvious place in cmd/compile/ (in tconv there's thousands of <1KB width, mostly <250 byte, string allocations made in most compile calls), but there may be worthwhile changes in dependencies outside this directory or in rearchitecting. If I submit another patch it will have higher code quality and a definite performance improvement.

And thanks for letting me work on this even though I've had to learn a lot and required feedback on suboptimal changes. I'll be out most of today through the weekend, I should be able to look at this again next week.

pciet commented 6 years ago

Here’s the time spent when compile (devel +1b1c8b3 Feb 17) is called on cmd/compile/internal/gc:

screen shot 2018-03-01 at 11 02 45 am screen shot 2018-03-01 at 11 03 12 am screen shot 2018-03-01 at 11 08 27 am

The tconv things (dumpobj1) account for only 6.6% of the compile time in this case, but garbage collection (runtime.gcDrain) accounts for 18.7%. It’s possible that dumpobj recursive string concatenation generates a significant part of the garbage, so next I’ll try to show where garbage is left. gc.bottomUpVisitor.visitcode, gc.compile, and ssa.Compile are other places to look.

bradfitz commented 6 years ago

You want to look at memory (allocation) profiles, not CPU profiles, if your goal is to eliminate the creation of garbage. Reducing garbage reduces CPU.

pciet commented 6 years ago

@bradfitz for some reason the pprof heap profile doesn’t show any detail into gc.Main which is why I went to CPU, but I’ll keep trying.

pciet commented 6 years ago

With debug.SetGCPercent(-1) and runtime.MemProfileRate = 1 compiling cmd/compile/internal/gc:

screen shot 2018-03-01 at 3 40 45 pm screen shot 2018-03-01 at 3 46 22 pm

I'm not sure why the dumpobj string concats don't end up here.

bradfitz commented 6 years ago

I primarily look at -alloc_space. It's space that ultimately triggers collections.

pciet commented 6 years ago

Well I’m not convinced making small changes like replacing some bytes.Buffer use is worth it, for larger changes I think I’ll need a lot more time to understand the compiler and in-flight efforts, and I’m in the middle of some life changes.

I’m unable to work on this more for now.

josharian commented 6 years ago

Thanks for looking into it.

prattmic commented 2 years ago

Note that after #44505, I don't believe it will be necessary to support building without strings.Builder.