Colab support - Githubissues

benshayden commented 7 years ago

Results.html and the chromeperf dashboard are optimized for some common workflows for working with performance data, but there will always be use-cases that we can't support through them. Colab (inside google / outside google) can offer an escape valve to allow power users to write some code to leverage data from telemetry or the dashboard or other sources in their own custom way. Colab can also provide a testing ground for users to demonstrate features that should be added to another product. Here are some initial feature ideas.

run_benchmark --output-format=colab to automatically create a new colab notebook containing telemetry results, or append new results to an existing colab notebook with --notebook=<doc_id>
Export {raw, merged} {json, csv} from results.html to facilitate uploading them to colab notebooks
Provide a notebook of snippets to demonstrate capabilities
Provide a kernel containing catapult's python Histogram system so it doesn't need to be embedded into notebooks
Support colab's charts in the python Histogram system
- histogram bar charts
- breakdown bar charts
- line charts
- parallel coordinate charts
- scatter charts
Support grouping, merging histograms #3723
Pivot tables
Provide helpers for querying other data sources:
- the chromeperf dashboard api
- UMA
- crbug
- chromecrash

This sounds like fun, but will probably be low priority for a while, so no scheduling guarantees.

@pasko Any other ideas?

pasko commented 7 years ago

Thank you for ideas, Ben!

fwiw, I am leaning more towards telemetry producing raw output and a library of colabs to pull it in (rather than creating a colab). This makes more sense for power users, as everyone's knowledge of colab is evolving :)

Also, having turducken code in telemetry (JS in HTML in Python in Python) would be unpleasant to maintain.

On Tue, Aug 29, 2017 at 1:09 AM, Ben Hayden notifications@github.com wrote:

Results.html https://github.com/catapult-project/catapult/blob/master/docs/metrics-results-ui.md and the chromeperf dashboard http://chromeperf.appspot.com/ are optimized for some common workflows for working with performance data, but there will always be use-cases that we can't support through them. Colab (outside google http://colaboratory.jupyter.org/welcome/ / inside google http://colab) can offer an escape valve to allow power users to write some code to leverage data from telemetry or the dashboard or other sources in their own custom way. Colab can also provide a testing ground for users to demonstrate features that should be added to another product. Here are some initial feature ideas.

run_benchmark --output-format=colab to automatically create a new colab notebook containing telemetry results, or append new results to an existing colab notebook with --notebook=

Export {raw, merged} {json, csv} from results.html to facilitate uploading them to colab notebooks

Provide a notebook of snippets to demonstrate capabilities

Provide a kernel containing catapult's python Histogram system so it doesn't need to be embedded into notebooks

Support colab's charts in the python Histogram system

histogram bar charts

breakdown bar charts

line charts

parallel coordinate charts

scatter charts

Those examples would be nice! I am thinking of common notebooks for my own set of metrics/benchmarks/bots. It would be different from anyone else's.

-

Support grouping, merging histograms #3723 https://github.com/catapult-project/catapult/issues/3723

Pivot tables

Provide helpers for querying other data sources:

the chromeperf dashboard api

UMA

crbug

This sounds like fun, but will probably be low priority for a while, so no scheduling guarantees.

@pasko https://github.com/pasko Any other ideas?

FYI, something I wrote recently: http://go/pasko-study-eager-cookies. Was impossible to do with perf trybots, but it is a separate issue. One issue with json format was that I could not separate about:blank samples from those coming from loads of bbc.co.uk.

-- Egor Pasko

benshayden commented 7 years ago

Thanks!

Also, having turducken code in telemetry (JS in HTML in Python in Python) would be unpleasant to maintain.

I was just discussing this with @eakuefner . We tried hard to follow trace-viewer's lead and maintain only 1 implementation of TBM2 in JS, and I think we can avoid most of the problems of turducken code, but the natures of the perf test harnesses right now seem to necessitate implementations for most TBM2 features in JS, Python, and C++, so I'm currently working to simplify them: #3507 #3761 . I can go into more detail if you want to schedule a VC.

One issue with json format was that I could not separate about:blank samples from those coming from loads of bbc.co.uk.

Yeah, I just noticed that problem, too. I'll file a bug about propagating metadata into sample diagnostics when merging Histograms.

perezju commented 7 years ago

For what it's worth, I've been having some success by taking the histograms json, doing the metadata propagation myself, and then dumping into some form of csv to finally load from within a colab.

I did not know (or maybe I did but then forgot?) that results.html allows to export to csv too. The exported csv looks pretty neat and looks like it contains all the data I could potentially need.

Do you think we need something else other than this csv format to play around with the data in colabs?

pasko commented 7 years ago

On Tue, Sep 5, 2017 at 10:36 AM, Juan A. Navarro Pérez < notifications@github.com> wrote:

For what it's worth, I've been having some success by taking the histograms json, doing the metadata propagation myself, and then dumping into some form of csv to finally load from within a colab.

I did not know (or maybe I did but then forgot?) that results.html allows to export to csv too. The exported csv looks pretty neat and looks like it contains all the data I could potentially need.

Do you think we need something else other than this csv format to play around with the data in colabs?

hm, in that CSV I found only aggregated stats (like averages), is it different for you? Can it depend on benchmark?

For colab I usually need raw data, ideally one row per benchmark run, with each metric in a dedicated column, but the actual format does not matter as long as this information can be restored with a short script.

-- Egor Pasko

perezju commented 7 years ago

I played a bit with it now after creating this results.html running the start_with_url.cold.startup_pages benchmark and noted that:

Clicking on "download" after loading the page just gives me a csv with 3 summary histograms (one for each of the 3 metrics).
If I tick the boxes to show both "name" and "stories" in the UI, and then download the csv; then the resulting csv has one histogram per (metric, story) combination. And each histogram shows the stats for 5 samples (since the benchmark has an implicit --storyset-repeat=5).
For some reason there is no box to tick to also get something like "story_repeat". I remember seeing such a thing in some results.html files for memory. Am I missing something here or is this a bug?

Anyway, I agree this behavior is strange and confusing. I would expect the "download CSV" to just download each and every histogram for all (metric, story, story_repeat) combinations; not just "download selection" which is sort of what the current implementation does.

Maybe this is the intended distinction between downloading "raw/merged" hisgorams in #3838?

benshayden commented 7 years ago

Thanks for the feedback! Keep it coming!

Do you think we need something else other than this csv format to play around with the data in colabs?

I'm imagining providing a colab kernel containing the Histogram python library so that you can

download the raw/merged JSON from a results.html,
upload that to the kernel, and
slice the histograms and diagnostics however you want in python with a high-level API.

Does that sound like it might work?

Ideally, using colab to process benchmark results should only be necessary for power users and their edge use cases. Results.html should support the more common exploration use cases. Long term, we could work out a path to add features that were pioneered by colabbers to results.html and the dashboard. Key words: long term and could.

in that CSV I found only aggregated stats

Yep, the CSV format does not contain sample values because there could be hundreds of them in each Histogram. CSV is a middle-ground between human-readable and machine-readable. All the juicy details like sample values and diagnostics are in the machine-readable-only JSON format.

Maybe this is the intended distinction between downloading "raw/merged" histograms in #3838?

Yes! "story_repeat" is called storysetRepeats now. There are only 5 raw histograms in that results.html. None of them contain storysetRepeats because there's no easy way to plumb that metadata for legacy benchmarks such as start_with_url. Is there a plan to migrate that benchmark to TBM2?

To clarify, perezju's "histogram for all (metric, story, story_repeat) combinations" is the "raw" results, and pasko's "one row per benchmark run" is the "merged" results because each benchmark run can contain multiple stories and story runs. You can group by benchmarkStart in order to merge results across stories and storysetRepeats but not benchmark runs. You're right, it is complicated to mentally work out pivot tables like this. I think it will be much easier for you to do whatever you want with a high-level python API in colab.

Does that help?

pasko commented 7 years ago

On Sep 5, 2017 18:55, "Ben Hayden" notifications@github.com wrote:

Thanks for the feedback! Keep it coming!

Do you think we need something else other than this csv format to play around with the data in colabs?

I'm imagining providing a colab kernel containing the Histogram python library https://github.com/catapult-project/catapult/blob/master/tracing/tracing/value/histogram.py so that you can

download the raw/merged JSON from a results.html,
upload that to the kernel, and
slice the histograms and diagnostics however you want in python with a high-level API.

Does that sound like it might work?

This adds a dependency and potentially the friction of integrating it into colab and other environments. I agree that it would work for this use case. Thanks.

Asking for another /pony: an archive with all traces, so that there is a 1:1 correspondence between csv row and a trace file (for more detailed automated analysis).

Ideally, using colab to process benchmark results should only be necessary for power users and their edge use cases.Results.html should support the more common exploration use cases.

This is not necessarily ideal as it reinvents colab with less flexibility. Why not just use colab all the time?

Long term, we could work out a path to add features that were pioneered by colabbers to results.html and the dashboard. Key words: long term and could.

Is this essentially rewriting colab python into results.html javascript? Why spending time on it? I believe everyone is capable of changing a path in colab notebook (corresponding to a specific benchmark run) and running it.

in that CSV I found only aggregated stats

Yep, the CSV format does not contain sample values because there could be hundreds of them in each Histogram. CSV is a middle-ground between human-readable and machine-readable. All the juicy details like sample values and diagnostics are in the machine-readable-only JSON format.

Maybe this is the intended distinction between downloading "raw/merged" histograms in #3838 https://github.com/catapult-project/catapult/issues/3838?

Yes! "story_repeat" is called storysetRepeats https://github.com/catapult-project/catapult/blob/master/tracing/tracing/value/diagnostics/reserved_names.html#L46 now. There are only 5 raw histograms in that results.html https://github.com/catapult-project/catapult/files/1277006/results.html.zip. None of them contain storysetRepeats because there's no easy way to plumb that metadata for legacy benchmarks such as start_with_url. Is there a plan to migrate that benchmark to TBM2?

To clarify, perezju's "histogram for all (metric, story, story_repeat) combinations" is the "raw" results, and pasko's "one row per benchmark run" is the "merged" results because each benchmark run can contain multiple stories and story runs. You can group by benchmarkStart https://github.com/catapult-project/catapult/blob/master/tracing/tracing/value/diagnostics/reserved_names.html#L23 in order to merge results across stories and storysetRepeats but not benchmark runs. You're right, it is complicated to mentally work out pivot tables like this. I think it will be much easier for you to do whatever you want with a high-level python API in colab.

Does that help?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/catapult-project/catapult/issues/3806#issuecomment-327238012, or mute the thread https://github.com/notifications/unsubscribe-auth/ABFJn7VmXCjjdEcFy1JRIDlKtR4mwykdks5sfXz9gaJpZM4PFLfT .

perezju commented 7 years ago

(benshayden) I'm imagining providing a colab kernel containing the Histogram python library

I honestly would just be happy with a quick and easy easy way to get the "raw" csv out of a results.html file.

Not sure about @pasko 's workflow; but I imagine colab users wanting to get the data as quickly as posible into a DataFrame (and csv is great for that) to then do all the stats, grouping and dicing over there.

Might be nice to have; but I don't see myself doing much directly with the Histogram library itself.

(benshayden) Long term, we could work out a path to add features that were pioneered by colabbers to results.html and the dashboard.

+1 to that.

(pasko) Asking for another /pony: an archive with all traces, so that there is a 1:1 correspondence between csv row and a trace file (for more detailed automated analysis).

+1000 to that! But I think you're already working on that. Basically being able to correlate each value with the trace it comes from. And having all traces in a neat package rather than all spilled over some random directory.

pasko commented 7 years ago

answering from gmail on Android broke quoting, sorry about that, won't do again

anniesullie commented 7 years ago

(pasko) Asking for another /pony: an archive with all traces, so that there is a 1:1 correspondence between csv row and a trace file (for more detailed automated analysis).

+1000 to that! But I think you're already working on that. Basically being able to correlate each value with the trace it comes from. And having all traces in a neat package rather than all spilled over some random directory.

@dave-2 @simonhatch I wonder if we'd also like to be able to correlate value<->trace in pinpoint.

catapult-project / catapult

Colab support #3806