Closed dave-2 closed 5 years ago
@simonhatch: (on https://chromium-review.googlesource.com/c/catapult/+/1087595)
This looks like it's a summary metric, no? https://chromeperf.appspot.com/group_report?bug_id=846864
My guess is this is just a byproduct of the differences in how histograms do summarization. Looking at https://chromeperf.appspot.com/report?sid=bfecd6a30c8691a8d0c7a1ec048e309681a6776bfbb5d8a1fbbfcd36be543607&start_rev=551859&end_rev=563958, no individual pages move but the summary as a whole does (histograms summarizes slightly differently, this is known and was announced a while back).
Ethan, is that correct?
@eakuefner
Yes, this is a summary metric. The reason Pinpoint cannot find it is because it is not present in the HistogramSet as produced by Telemetry. Instead, it is only produced by add_reserved_diagnostics, which runs after Telemetry runs but before the data is uploaded to the perf dashboard.
Probably this is WAI, and possibly we could show a warning on the Dashboard side (or perhaps even outright disallow this, if we know we're going to be using histograms on the Pinpoint side, which I think that's now default IIRC).
Does that mean there will be alerts that can't be bisected on?
Yes, but I think in that case, it would be worthwhile to audit the alerting configs that are producing these un-bisectable alerts. As you know, in general our team does not recommend alerting on summary metrics because they are bound to be noisy as a matter of course.
Ideally, as @simonhatch and I have discussed offline with @nedn, we'd move past the current model where we upload these summaries explicitly and store them as timeseries, to something more lightweight where we fetch and maybe cache summaries that matter (e.g. because they are featured in a V2SPA releasing report). I think @benshayden has done a little bit of thinking around the idea of merging histograms on the fly in V2SPA, so maybe it's time we had a conversation around "summaries and how to get rid of them".
V2SPA already merges timeseries on the fly. It doesn't yet afford starting pinpoint jobs from the chart, but when it does, I can make it require the user to drill down into a test-case-specific timeseries first.
I'm not sure if we'd want to remove summary timeseries from the database. It's much faster to load a single timeseries than 20 or 100. If we can refactor TestMetadata from test paths to descriptors, I might actually like to store timeseries that summarize across all bots so that users don't need to select a bot when browsing charts -- they could choose to drilldown through related measurements or test case sparklines first if they prefer.
Auditing alerting configs to remove summary timeseries SGTM. Is there a bug for that?
Should we simply disallow summary metric bisects from the dashboard UI? or big fat warning + suggestions on what to do
https://pinpoint-dot-chromeperf.appspot.com/job/169a849b240000 https://pinpoint-dot-chromeperf.appspot.com/job/15b12d63240000 https://pinpoint-dot-chromeperf.appspot.com/job/11e7a563240000 https://pinpoint-dot-chromeperf.appspot.com/job/1286a763240000 https://pinpoint-dot-chromeperf.appspot.com/job/16dbb8bb240000 https://pinpoint-dot-chromeperf.appspot.com/job/11a876ab240000 https://pinpoint-dot-chromeperf.appspot.com/job/16e5e483240000 https://pinpoint-dot-chromeperf.appspot.com/job/16d53afd240000 https://pinpoint-dot-chromeperf.appspot.com/job/12fb718b240000 https://pinpoint-dot-chromeperf.appspot.com/job/11dd3eab240000
Ping, this is the leading cause of Pinpoint failures. https://pinpoint-dot-chromeperf.appspot.com/job/11a055c0a40000 https://pinpoint-dot-chromeperf.appspot.com/job/12870ef3240000 https://pinpoint-dot-chromeperf.appspot.com/job/119d9c3f240000 https://pinpoint-dot-chromeperf.appspot.com/job/16b84257240000 https://pinpoint-dot-chromeperf.appspot.com/job/12fa2ed3240000 https://pinpoint-dot-chromeperf.appspot.com/job/14b4876b240000
Uhm, I have to oppose to the decision to simply disallow bisection on summary metrics (maybe I have a different problem that can be solved in other ways).
For runtime-call stats we do have sub/summed metrics exactly for the purpose of reducing noise. Here is the example hierarchy:
Usually the regressions on the toplevel V8-Total metric is too small, while you get a very stable and strong signal on the sub-metric. This is the very reason at least in our case, we do want to bisect on the sub-metric. Note that V8-Total is the sum of all V8 metric, and that each V8 submetric is the sum of many small sub-sub-metrics.
I'm currently unable to address the majority of detected regressions manually due to this bug since there are too many V8 CLs in the revision range.
Hmm can you clarify that a bit more? In your example why can't you find a sub-metric to bisect on instead of the summary metric? You mentioned that you disagree with disallowing bisection on summary metrics, but your example goes on to choose a sub-metric for the reason that the summary regression is too small.
I hope i get this right :).
Some background first: To clarify, V8-Total is the sum of all V8 component metrics (V8-C++, V8-Compiler, V8-GC...), those metrics themselves are sums of the actual measurement scopes from within V8 (for V8-C++ this might be something like V8-Array-join, V8-Allocate-Array...). In total we've have in the order of 200 low-level metrics, too many to track, so we don't have alerts on them and hardly ever want to bisect on that level.
For the next-gen benchmarks we will have a similar structure of metrics for the V8 memory: V8-Total-memory / (V8-new-space, V8-old-space...) / (memory usage by instance type).
Bisection usually happens on the summary metrics over V8 components (V8-C++, V8-Compiler and friends) since those conveniently overlap with sub-team responsibilities.
Do the following statements make sense?
Gotcha, so to summarize, there could be a lot of per-page metrics to dig through and it makes more sense for you to alert on the summary instead, and bisect on either a per-page metric or a sub-summary.
If we were to switch to per-page alerts, we do have more intelligent grouping available that now that should be able to group alerts from related metrics together, is that still less desirable?
We could try how this works on v8.browsing_stories, I don't have a good notion on what would change.
@benshayden Is the metric relationship data still hardcoded, or can you fetch it from the diagnostic now? How much work is it to add v8's?
First, to clarify "removing summary timeseries", I'd like to investigate performance a bit before removing them from the database. It might be unbearably slow to fetch 100 timeseries and merge them -- storing summary timeseries might be a necessary performance optimization. In that case, it might make architectural sense to merge them in add_histograms_queue instead of requiring uploaders to merge them. We'll keep talking about it.
Relationships are still hardcoded in the frontend. I added sparkline tabs for RAILS and Components for v8:browsing* test suites. See status. I'd be happy to hardcode more related measurements sparkline tabs. Which test suites should I add sparklines for, and which measurements are related to which?
So update on this, have some bandwidth so I was going to port the summarization into pinpoint so that it can, at least for the time being, bisect on summary values. Quick point about the bisects @dave-2 posted earlier, it actually looks like only maybe 2 of those failed due to the summary issue. The rest failed to run any stories at all.
https://bugs.chromium.org/p/chromium/issues/detail?id=846864 https://pinpoint-dot-chromeperf.appspot.com/job/159f3952240000 The histogram is present in the histograms json.
@simonhatch