cmd/compile: per-phase compiler benchmarking

mdempsky commented 8 years ago

Right now we can use compilebench to measure the compiler's overall run time, and we can use -cpuprofile to measure fine-grained per-function/instruction profiles. But with the interest in continuing to evolve/rewrite individual phases of the compiler, I think it would be nice to track the compiler's performance on a per-phase granularity.

Rough proposal sketch:

Add a flag to cmd/compile like -phasebench=prefix:outfile.
Add instrumentation to cmd/compile/internal/gc/main.go so that when phasebench is enabled, it will append benchmark data formatted entries to outfile of the form "prefix-parse", "prefix-typecheck", etc.
compilebench can then optionally make use of this new flag, and we can start tracking the more detailed information.

josharian commented 8 years ago

Yes please! I assume you can pass multiple phase:outfile pairs?

How about just -phasebench=prefix and benchmark all phases, with a phase-specific suffix? I worry that while optimizing one phase decisions might get made that impact other phases without getting noticed. In particular, frontend/backend interactions can be like this. As a concrete example, I have a CL waiting for 1.8 that removes some empty goto-label pairs that were introduced as a frontend convenience, the removal of which cuts overall compiler allocation by 1%.

mdempsky commented 8 years ago

To be clear, my intention was -phasebench=prefix:outfile would measure all phases, writing the benchmarks to outfile, each prefixed by the string prefix. The idea being so compilebench could use -phasebench=BenchmarkGoTypes:/tmp/benchmark.txt, -phasebench=BenchmarkUnicode:/tmp/benchmark.txt, etc. and the outputs would be lines labeled BenchmarkGoTypes-parse, BenchmarkGoTypes-typecheck, BenchmarkUnicode-parse, BenchmarkUnicode-typecheck, etc.

griesemer commented 8 years ago

Yes. Funny coincidence, I spent this morning experimenting with adding structured statistics code to the front-end. This is an item on my TODOs.

My thinking at the moment:

There's the notion of a Measurement (an interface, abstracts over a time/duration, count, or other measure).
There are Groups, which collect multiple Measurements. Depending on what it is, it may accumulate/average the data.
There are individual Measurements, such as timings, counts. etc.

The hierarchy permits collecting a lot of data in a structured (and labeled) way. The code augmentation is typically a single extra line per start/stop event.

A single "print" at the end of the compile dumps the output in a structured form.

I'm thinking that there should be a JSON variant for dumping the data to a file so that I can be processed later (i.e., so we can collect the data across many runs/times/versions and then produce graphs).

griesemer commented 8 years ago

PS: Standard-benchmark data format should also be easily possible, but I think that format might be generated after the individual measurements (e.g., a single compile produces just one data set). It's also not clear to me how one would organize more structured data in that format.

mdempsky commented 8 years ago

Certainly not opposed to a structured API. My immediate idea was to start simple with instrumentation like m := benchmarker() and then calls to m.mark("parse"), m.mark("typecheck"), etc. appropriately scattered around. If you have anything more thought out I'm all ears. :)

As for format, one of my takeaways from @aclements' presentation about github.com/aclements/go-gg was that the benchmark format may be sufficient as long as we adequately structure the benchmark names. E.g., see the examples with names like BenchmarkDecode/text:digits/level:speed/size:1e4/gomaxprocs:8.

/cc @aclements for thoughts on data format

josharian commented 8 years ago

One thing to plan for: some phases may be less clear or overlap in a more heavily concurrent compiler. Also, ssa phases already happen in a cycle, once per function.

griesemer commented 8 years ago

With "structured API" I didn't mean large framework. At the moment it's about 75 lines of code, with minimal code (one-liners) at measuring points in the front-end. The grouping mostly removes the need to think much about label names: Basically, introducing a new subgroup is a local operation; no need to rename other labels. For instance:

var stats = NewGroup("compilation of ...") // initialize compiler stats, a group of measurements
...
parse := stats.AddTiming("parse") // create a start a new time measurement
// parser invocations
parse.Stop()
...
phase1 := stats.AddTiming("phase1")
...
phase1.Stop()

etc. A bit more code than you suggest.

griesemer commented 8 years ago

That is, each time measurement gets its own Timing object which may track more than just the time. It may carry the number of bytes processed, or a count (iterations), or other things. Slightly more than just a benchmark map that tracks labels and times. But not a lot more.

mdempsky commented 8 years ago

@josharian Agreed. That's why I was thinking of using names like "typecheck" instead of "phase1" or "phase2", so that the name could remain meaningful long-term even if we end up reorganizing how the specific phase boundaries.

@griesemer Okay, yeah, that's a little more code (explicitly marking begin/end of each phase rather than making it implicit by just marking transitions), but sounds roughly isomorphic to what I was thinking.

griesemer commented 8 years ago

@mdempsky Maybe it's overkill to mark each start/end and it's good enough to just track boundaries. But it seems to me that it might be more difficult to track a specific time (say escape analysis) if some of the other phases move around. Also, it would seem more difficult to attach bytes (to compute things like source bytes parsed/time unit, etc.).

On the other hand, simpler is often better. Maybe it is good enough to just track transition times. Still exploring.

dr2chase commented 8 years ago

The ssa phases can all be timed. For example

go build -gcflags -d=ssa/all/time step_hide.go
# command-line-arguments
./step_hide.go:13:  early phielim   TIME(ns)    9458    foo
./step_hide.go:13:  early copyelim  TIME(ns)    3365    foo
./step_hide.go:13:  early deadcode  TIME(ns)    288821  foo
./step_hide.go:13:  short circuit   TIME(ns)    293607  foo
./step_hide.go:13:  decompose user  TIME(ns)    303039  foo
./step_hide.go:13:  opt TIME(ns)    2593795 foo

I think the format needs work; that seems to not conform to what I recall Austin telling me was the desired form for the benchmarking commands we already have. Right now the function name comes last because sometimes those have exciting syntax in them.

griesemer commented 8 years ago

@mdempsky, @josharian Please have a look at https://go-review.googlesource.com/24462 . After considering the feedback above I decided to abandon the structured data idea (seemed overkill) and go with a pretty simple approach, essentially along the lines of @mdempsky's proposal. I expect this to morp h a bit over time but it's a starting point.

gopherbot commented 8 years ago

CL https://golang.org/cl/24462 mentions this issue.

aclements commented 8 years ago

As for format, one of my takeaways from @aclements' presentation about github.com/aclements/go-gg was that the benchmark format may be sufficient as long as we adequately structure the benchmark names. E.g., see the examples with names like BenchmarkDecode/text:digits/level:speed/size:1e4/gomaxprocs:8.

/cc @aclements for thoughts on data format

I don't know exactly what data you want to represent, but the "extended" benchmark format is pretty flexible. It's not great at hierarchical things, but it's not clear you really want that. And, of course, you'll be able to use all of the tools that know how to speak it (which isn't much yet, but the format is young). You could have a benchmark line for each (file, phase) where the phase is a property, and results for time, allocations, etc. You might just want to name each benchmark after each phase. Something like

BenchmarkPhielem/file:step_hide.go/gomaxprocs:4 1   9458 ns/op   100 bytes-allocated    2 allocations

Maybe you want the function in there. That may be too detailed, but it would be easy to write a tool that aggregated it in various ways after the fact, e.g., using https://godoc.org/github.com/aclements/go-misc/bench.

griesemer commented 8 years ago

The pending CL now enables writing to a file and is using the benchmark data format.

rsc commented 8 years ago

This seems to be done.

golang / go

cmd/compile: per-phase compiler benchmarking #16169