TimelyDataflow / differential-dataflow

An implementation of differential dataflow using timely dataflow on Rust.
MIT License
2.58k stars 183 forks source link

Profiling CPU usage of differential programs #166

Open ryzhyk opened 5 years ago

ryzhyk commented 5 years ago

I would like to understand where my differential program spends time. Is my best bet to just use a generic CPU profiler or is there some sort of API for this in timely/differential? Using a generic profiler does not seem very promising, as the particular program I am looking at has thousands of operators, so it will be hard to correlate call stacks with specific operators.

In the past, the tracing API proved immensely useful in optimizing the memory footprint of my programs (#113). I was hoping to use something similar for CPU profiling.

frankmcsherry commented 5 years ago

Indeed there is!

Probably the best worked example at the moment is @utaal's timely-viz repository, whose dashboard.rs example grabs the timely logging stream and pulls out scheduling information for operators to report on how long each operator is taking.

It could use a bit of polish, though my understanding is that @li1 is also currently working in a proximate space (using timely logs to get out performance information). @comnik et al may have this data linked into 3DF, and it would be interesting if that system could be used as an interactive profiler for other timely computations...

comnik commented 5 years ago

That is a goal for 3DF, although I haven't tested that in practice. 3DF can query its own logging streams, or logging streams from dataflows running on the same workers, but it's lacking a remote source a la timely-viz.

ryzhyk commented 5 years ago

This is really helpful, thank you! timely-viz seems very nice, but it might be an overkill for my purposes. Looking at the code of timely-viz, it appears that all I need to do to get a breakdown of differential runtime is to subscribe to timely::logging::ScheduleEvent and add up all the intervals per operator. Is this correct or is there more to it?

frankmcsherry commented 5 years ago

If you snag the linked fragment, it should do this for you. I recommend rounding the timestamps, to decrease the update frequency.

Sent from my iPhone

On Apr 9, 2019, at 18:27, Leonid Ryzhyk notifications@github.com wrote:

This is really helpful, thank you! timely-viz seems very nice, but it might be an overkill for my purposes. Looking at the code of timely-viz, it appears that all I need to do to get a breakdown of differential runtime is to subscribe to timely::logging::ScheduleEvent and add up all the intervals per operator. Is this correct or is there more to it?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

ryzhyk commented 5 years ago

I did something even simpler and just wrote a simple handler that aggregates timestamps without using differential. I obtained a profile that looks something like this:

.Dataflow 0:                                                   3s 844231us
.recursive component 227:                                        1s 526638us
.Map 239:                                                     0s 662887us
.Map 215:                                                       0s 468107us
.Iterate 81:                                                      0s 359447us
.Reduce 219:                                                    0s 354180us
.Join 194:                                                        0s 276514us
.Reduce 65:                                                     0s 223143us
.Reduce 201:                                                   0s 216287us
.ThresholdTotal 243:                                          0s 121061us
.Join 233:                                                      0s 109519us
.Arrange 241:                                                 0s 73108us
.Arrange 130:                                                 0s 64692us
....

This seems to suggest some hierarchical structure, e.g., Dataflow 0 above most likely contains all of the other operators; likewise recursive component 1 is an aggregation of multiple operators.

Is there a way to re-construct the operator tree automatically for more meaningful profiling?

Thanks!

frankmcsherry commented 5 years ago

Yes, in the Operate event type there should be an address, which is a description of the operators location in a tree of dataflows/scopes/operators. It should reveal the hierarchical structure to you. There is also an operator index, which you should be able to correlate with the index up above.

comnik commented 5 years ago

Somewhat adapted from timely-viz, in https://github.com/comnik/declarative-dataflow/blob/master/src/sources/timely_logging.rs#L77 I turn operator addresses back into edges in the operator graph. Not sure how helpful this is, but this is the extent of what I've been doing.

ryzhyk commented 5 years ago

Thanks, @frankmcsherry, @comnik!

utaal commented 5 years ago

Hi @ryzhyk, @li1 is looking into extending our trace-based online performance diagnostics work[1] and he also needs to re-construct the operator tree to then extract profiling and tracing information from the logs. https://github.com/utaal/timely-viz/blob/master/src/bin/graph.rs#L105-L114 should get you the leaf nodes with differential (by anti-joining the full collection of operator addresses with the same collection where the last element of the address has been stripped off). I use this in timely-viz to draw the dataflow graph of the "actual" operators (those that don't represent a subscope). We are looking into extending the perf diagnostic work, and it would be great to know what your experience has been with timely logging, and what you'd like to see from a dataflow trace analysis. If you're interested, we can chat more about what our current plans are and see if they align with what you're looking for: feedback from concrete use cases is really helpful. /cc @vasia

[1] https://www.usenix.org/node/211258

li1 commented 5 years ago

Hi @ryzhyk, as @utaal and @frankmcsherry mentioned, I'm looking into online profiling (in particular performance optimization & debugging) of timely & differential using log analysis.

So if you have any questions regarding how the logs work (e.g. how to interpret the various TimelyEvent messages) or how you could extract some kind of insight (e.g. explaining performance bottlenecks) from them, just let me know!

ryzhyk commented 5 years ago

@utaal, @li1, many thanks for offering your help!

At this point I was able to reconstruct the operator tree and compute aggregate runtimes of individual operators. I am still adding some DDlog-specific instrumentation to correlate operators back to the DDlog program source.

The timely logging infrastructure has been incredibly useful so far in that I was able to do all of the above with only few lines of code and without running into any issues.

My next step is to actually use this profiling information to optimize my programs. I would definitely be interested in your experience there, but also realize there is no general recipe for speeding things up. You will definitely hear from me in the coming weeks as I try to figure out why my code is taking longer than expected :)

utaal commented 5 years ago

Hi @ryzhyk, quick update: there's now a tool at TimelyDataflow/diagnostics to render dataflow graphs in the browser without additional dependencies. Installation is as easy as cargo install tdiag, more info on usage in the repo and with --help.