Closed mdempsky closed 8 years ago
Yes please! I assume you can pass multiple phase:outfile pairs?
How about just -phasebench=prefix and benchmark all phases, with a phase-specific suffix? I worry that while optimizing one phase decisions might get made that impact other phases without getting noticed. In particular, frontend/backend interactions can be like this. As a concrete example, I have a CL waiting for 1.8 that removes some empty goto-label pairs that were introduced as a frontend convenience, the removal of which cuts overall compiler allocation by 1%.
To be clear, my intention was -phasebench=prefix:outfile
would measure all phases, writing the benchmarks to outfile
, each prefixed by the string prefix
. The idea being so compilebench could use -phasebench=BenchmarkGoTypes:/tmp/benchmark.txt
, -phasebench=BenchmarkUnicode:/tmp/benchmark.txt
, etc. and the outputs would be lines labeled BenchmarkGoTypes-parse
, BenchmarkGoTypes-typecheck
, BenchmarkUnicode-parse
, BenchmarkUnicode-typecheck
, etc.
Yes. Funny coincidence, I spent this morning experimenting with adding structured statistics code to the front-end. This is an item on my TODOs.
My thinking at the moment:
The hierarchy permits collecting a lot of data in a structured (and labeled) way. The code augmentation is typically a single extra line per start/stop event.
A single "print" at the end of the compile dumps the output in a structured form.
I'm thinking that there should be a JSON variant for dumping the data to a file so that I can be processed later (i.e., so we can collect the data across many runs/times/versions and then produce graphs).
PS: Standard-benchmark data format should also be easily possible, but I think that format might be generated after the individual measurements (e.g., a single compile produces just one data set). It's also not clear to me how one would organize more structured data in that format.
Certainly not opposed to a structured API. My immediate idea was to start simple with instrumentation like m := benchmarker()
and then calls to m.mark("parse")
, m.mark("typecheck")
, etc. appropriately scattered around. If you have anything more thought out I'm all ears. :)
As for format, one of my takeaways from @aclements' presentation about github.com/aclements/go-gg was that the benchmark format may be sufficient as long as we adequately structure the benchmark names. E.g., see the examples with names like BenchmarkDecode/text:digits/level:speed/size:1e4/gomaxprocs:8
.
/cc @aclements for thoughts on data format
One thing to plan for: some phases may be less clear or overlap in a more heavily concurrent compiler. Also, ssa phases already happen in a cycle, once per function.
With "structured API" I didn't mean large framework. At the moment it's about 75 lines of code, with minimal code (one-liners) at measuring points in the front-end. The grouping mostly removes the need to think much about label names: Basically, introducing a new subgroup is a local operation; no need to rename other labels. For instance:
var stats = NewGroup("compilation of ...") // initialize compiler stats, a group of measurements
...
parse := stats.AddTiming("parse") // create a start a new time measurement
// parser invocations
parse.Stop()
...
phase1 := stats.AddTiming("phase1")
...
phase1.Stop()
etc. A bit more code than you suggest.
That is, each time measurement gets its own Timing object which may track more than just the time. It may carry the number of bytes processed, or a count (iterations), or other things. Slightly more than just a benchmark map that tracks labels and times. But not a lot more.
@josharian Agreed. That's why I was thinking of using names like "typecheck" instead of "phase1" or "phase2", so that the name could remain meaningful long-term even if we end up reorganizing how the specific phase boundaries.
@griesemer Okay, yeah, that's a little more code (explicitly marking begin/end of each phase rather than making it implicit by just marking transitions), but sounds roughly isomorphic to what I was thinking.
@mdempsky Maybe it's overkill to mark each start/end and it's good enough to just track boundaries. But it seems to me that it might be more difficult to track a specific time (say escape analysis) if some of the other phases move around. Also, it would seem more difficult to attach bytes (to compute things like source bytes parsed/time unit, etc.).
On the other hand, simpler is often better. Maybe it is good enough to just track transition times. Still exploring.
The ssa phases can all be timed. For example
go build -gcflags -d=ssa/all/time step_hide.go
# command-line-arguments
./step_hide.go:13: early phielim TIME(ns) 9458 foo
./step_hide.go:13: early copyelim TIME(ns) 3365 foo
./step_hide.go:13: early deadcode TIME(ns) 288821 foo
./step_hide.go:13: short circuit TIME(ns) 293607 foo
./step_hide.go:13: decompose user TIME(ns) 303039 foo
./step_hide.go:13: opt TIME(ns) 2593795 foo
I think the format needs work; that seems to not conform to what I recall Austin telling me was the desired form for the benchmarking commands we already have. Right now the function name comes last because sometimes those have exciting syntax in them.
@mdempsky, @josharian Please have a look at https://go-review.googlesource.com/24462 . After considering the feedback above I decided to abandon the structured data idea (seemed overkill) and go with a pretty simple approach, essentially along the lines of @mdempsky's proposal. I expect this to morp h a bit over time but it's a starting point.
CL https://golang.org/cl/24462 mentions this issue.
As for format, one of my takeaways from @aclements' presentation about github.com/aclements/go-gg was that the benchmark format may be sufficient as long as we adequately structure the benchmark names. E.g., see the examples with names like BenchmarkDecode/text:digits/level:speed/size:1e4/gomaxprocs:8.
/cc @aclements for thoughts on data format
I don't know exactly what data you want to represent, but the "extended" benchmark format is pretty flexible. It's not great at hierarchical things, but it's not clear you really want that. And, of course, you'll be able to use all of the tools that know how to speak it (which isn't much yet, but the format is young). You could have a benchmark line for each (file, phase) where the phase is a property, and results for time, allocations, etc. You might just want to name each benchmark after each phase. Something like
BenchmarkPhielem/file:step_hide.go/gomaxprocs:4 1 9458 ns/op 100 bytes-allocated 2 allocations
Maybe you want the function in there. That may be too detailed, but it would be easy to write a tool that aggregated it in various ways after the fact, e.g., using https://godoc.org/github.com/aclements/go-misc/bench.
The pending CL now enables writing to a file and is using the benchmark data format.
This seems to be done.
Right now we can use compilebench to measure the compiler's overall run time, and we can use -cpuprofile to measure fine-grained per-function/instruction profiles. But with the interest in continuing to evolve/rewrite individual phases of the compiler, I think it would be nice to track the compiler's performance on a per-phase granularity.
Rough proposal sketch: