cmd/trace: problems and proposed improvements

Writing this issue out of the GopherCon 2019 contribution workshop, as a sort of umbrella task to encompass various limitations in the current go tool trace, and to speculate about possible solutions.

NOTE: most of these remarks are as of Go 1.11, which was the latest version that I reconciled into an experimental prototype fork. All feedback on what I've missed in the 1.12 or upcoming 1.13 trace tool code is more than welcome. As I write this issue, I'm skimming through the code on master to try to verify my statements, but the last time that I did a truly deep look was 1.11.

Problems

Event Parsing: the internal/trace parser used by the current tool is a rather slow 2-phase parser; also being internal, it cannot be re-used by other tools interested in processing trace files
Event Processing: currently the trace tool has two separate state machines for processing events: one for generating catapult-compatible JSON, the other for generating a statistics table
Lack of export: other than looking at the data in the catapult viewer, or scanning the html statistics table, there's no (easy / good) way to export data for analysis in something like the PyData stack (e.g. I've found a CSV file into a pandas dataframe to be quite effective for exploration)
Viewer Chunking: finally, and perhaps most superficially, most of the reason for splitting traces (chunking them into different time sections), as far as I can tell, is 100MB limits on strings in V8 wrt JSON parsing. This is largely because the catapult JSON data is typically between 10x and 100x larger than the native binary data.

Proposed Improvements

Parsing (library?)

A golang.org/x/trace library for dealing with trace files would be great, things it could cover:

access to the raw events themselves, out of something like a streaming event reader
separated event sorter for restoring overall wallclock order (since events are necessarily recorded in parallel batches)
separated higher level event, maybe with links: part of why the current parser is a 2-pass affair, is due to it creating convenience singly-linked chains of events on top of the raw events read from the file

Many of these improvements could be made in place inside internal/trace, but are perhaps better explored first / concretely proposed through a standalone personal repository, eventually perhaps for a golang.org/x package.

Event Processing (library?)

However much of what you need when dealing with trace data, is a bit of a state machine that tracks what various go runtime objects. There's at least two separate ones of those within the current trace tool: one for creating catapult JSON, the other for tabulating statistics.

This state machine, just like the events themselves, are highly dependent on the Go runtime, and necessarily need updates for new versions. I expect there can be a vigorous discussion around what layer to is best to export in a library: should it be the parser above (low or high level), or instead something more processed like spans of goroutine-relevant time...

Data Export Mode (tool change?)

Library or not, making the trace tool able to be used in an offline way to simply transform the binary data into a useful form for consumption (by Unix tools, by Pandas, etc) has proven quite useful in my experimental fork.

One useful pivot on the raw events, is to group them into spans of goroutine-relevant time, e.g. (conversationally):

G42 spent X time waiting after create (while runnable)
G42 then spent X time executing on CPU
G42 then spent X time blocking on a channel receive
G42 then spent X time waiting to run after being unblocked by G99
G42 then spent X time executing on CPU
... and so on

Basically, you can think of goroutines as going through this state machine (a direct analog of the classic Bach 86 Unix process state machine): Goroutine Lifecycle (3)

A really fruitful dataset then, is a table of time spent by each goroutine in each state:

not pre-summed as is seen in the current table of statistics
with associated goroutine ID for relationships like creation and unblock
also with a corresponding stack ID, that can be resolved in an ancillary table of stack frames

Better Trace Viewing

Now that most javascript runtimes support streaming, perhaps we could do better than static 100MB chunks.

As even more of a green field idea, maybe we could use something like http://gioui.org to build a dedicated trace viewer that wouldn't be held back the limitations of DOM parsing or rendering.

golang / go