golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
122.88k stars 17.52k forks source link

go/format: add benchmarks #26528

Open kevinburke opened 6 years ago

kevinburke commented 6 years ago

I maintain a (fork of a) tool called go-bindata that compiles static files into a binary. Often the compiled file is quite large - tens of megabytes are not uncommon. I call format.Source on this generated file to avoid gofmt thrash. However, I noticed recently that calling format.Source takes 60% of the runtime of the go-bindata program.

Currently there are no benchmarks for format.Source. It might be good to add some to see if there are quick wins to get that function to return more quickly.

A starting point may be the benchmark here: https://github.com/kevinburke/go-bindata/blob/master/benchmark_test.go. Committing an extremely large file to source might not be the best way to do this, though.

josharian commented 6 years ago

I suspect go/format performance hotspots vary a lot based on input, with a long tail of pathological problematic inputs. Instead of a generic “add benchmarks” issue, which is hard to fix, let’s instead file performance issues for specific inputs, like go-bindata-style output.

We don’t want to add a large file to the repo, but I suspect you could create a suitable input (for reproduction purposes) on the fly, in memory.

Also, have you tried with 1.11 betas yet? I made some improvements to text/tabwriter this cycle that might help.

kevinburke commented 6 years ago

Also, have you tried with 1.11 betas yet? I made some improvements to text/tabwriter this cycle that might help.

I have, thanks! I'm not sure about the benchmarking output. When we use format.Source on the whole file, it looks slower.

1.9.7: https://travis-ci.org/kevinburke/go-bindata/jobs/406748540 tip: https://travis-ci.org/kevinburke/go-bindata/jobs/406748542

I just merged a change that only calls format.Source on a small part of the file. With that change it looks like there's an improvement between 1.9.7 and tip.

https://travis-ci.org/kevinburke/go-bindata/jobs/406757117 https://travis-ci.org/kevinburke/go-bindata/jobs/406757130

kevinburke commented 6 years ago

So what are my options here for a benchmark? I am guessing these are not possible:

What about downloading a large file from the public internet, and skipping the benchmark if the download fails? This one for example would be suitable. https://github.com/kevinburke/go-bindata/blob/master/testdata/assets/bindata.go

Otherwise, I suppose I could check in a medium size photo (or use an existing one in the git repository), and manually reconstruct the process of building the bindata file.

josharian commented 6 years ago

Skimming the beginning of that file, I'm guessing you could get similar performance behavior with just something like:

package p

var x1 = []byte("some very very very long string created with bytes.Repeat")

var x2 = []byte("another long string")

var x3 = []byte("as many of these as you need to recreate your performance characteristics")

That should be easy to create in memory with a plain old for loop. In any case, I'd start there and use pprof profiles to convince yourself that the cpu/memory behavior is similar. If it isn't, tweak the simple generated file until it is. If somehow (which I would find very surprising) the contents of the strings involved matter, just generate random bytes using math/rand.

gopherbot commented 6 years ago

Change https://golang.org/cl/125437 mentions this issue: go/format: add benchmark

bronze1man commented 6 years ago

@kevinburke You have another choose that you can skip the step format.Source to your generated go files. Almost no one will read your generated files,I think it is ok to leave it unformat, you just need it do not change with the same input data (like sort the import path part). You can also generate a formated generated go files. It is not so hard then it looks like. Both way can save you a lot of generate time.

I used to use format.Source to format all my generated files, then I find that simple skip the format step can save me more than 50% of the generate time. So I just skip the format step.

gopherbot commented 6 years ago

Change https://golang.org/cl/127928 mentions this issue: go/format: benchmark performance on large autogen file

gopherbot commented 5 years ago

Change https://golang.org/cl/153642 mentions this issue: go/format: add simple benchmark framework and basic benchmark