[📍] Unable to find histogram

dave-2 commented 6 years ago

https://bugs.chromium.org/p/chromium/issues/detail?id=846864 https://pinpoint-dot-chromeperf.appspot.com/job/159f3952240000 The histogram is present in the histograms json.

Traceback (most recent call last):
  File "/base/data/home/apps/s~chromeperf/pinpoint:dev-dtu-fdf42141.409976256335155199/dashboard/pinpoint/models/quest/execution.py", line 97, in Poll
    self._Poll()
  File "/base/data/home/apps/s~chromeperf/pinpoint:dev-dtu-fdf42141.409976256335155199/dashboard/pinpoint/models/quest/read_value.py", line 160, in _Poll
    raise ReadValueError('Could not find values matching: %s' % conditions)
ReadValueError: Could not find values matching: {'histogram': u'total:500ms_window:renderer_eqt:v8:gc:incremental-marking'}

@simonhatch

dave-2 commented 6 years ago

@simonhatch: (on https://chromium-review.googlesource.com/c/catapult/+/1087595)

This looks like it's a summary metric, no? https://chromeperf.appspot.com/group_report?bug_id=846864

My guess is this is just a byproduct of the differences in how histograms do summarization. Looking at https://chromeperf.appspot.com/report?sid=bfecd6a30c8691a8d0c7a1ec048e309681a6776bfbb5d8a1fbbfcd36be543607&start_rev=551859&end_rev=563958, no individual pages move but the summary as a whole does (histograms summarizes slightly differently, this is known and was announced a while back).

Ethan, is that correct?

@eakuefner

eakuefner commented 6 years ago

Yes, this is a summary metric. The reason Pinpoint cannot find it is because it is not present in the HistogramSet as produced by Telemetry. Instead, it is only produced by add_reserved_diagnostics, which runs after Telemetry runs but before the data is uploaded to the perf dashboard.

Probably this is WAI, and possibly we could show a warning on the Dashboard side (or perhaps even outright disallow this, if we know we're going to be using histograms on the Pinpoint side, which I think that's now default IIRC).

dave-2 commented 6 years ago

Does that mean there will be alerts that can't be bisected on?

eakuefner commented 6 years ago

Yes, but I think in that case, it would be worthwhile to audit the alerting configs that are producing these un-bisectable alerts. As you know, in general our team does not recommend alerting on summary metrics because they are bound to be noisy as a matter of course.

Ideally, as @simonhatch and I have discussed offline with @nedn, we'd move past the current model where we upload these summaries explicitly and store them as timeseries, to something more lightweight where we fetch and maybe cache summaries that matter (e.g. because they are featured in a V2SPA releasing report). I think @benshayden has done a little bit of thinking around the idea of merging histograms on the fly in V2SPA, so maybe it's time we had a conversation around "summaries and how to get rid of them".

benshayden commented 6 years ago

V2SPA already merges timeseries on the fly. It doesn't yet afford starting pinpoint jobs from the chart, but when it does, I can make it require the user to drill down into a test-case-specific timeseries first.

I'm not sure if we'd want to remove summary timeseries from the database. It's much faster to load a single timeseries than 20 or 100. If we can refactor TestMetadata from test paths to descriptors, I might actually like to store timeseries that summarize across all bots so that users don't need to select a bot when browsing charts -- they could choose to drilldown through related measurements or test case sparklines first if they prefer.

Auditing alerting configs to remove summary timeseries SGTM. Is there a bug for that?

simonhatch commented 6 years ago

Should we simply disallow summary metric bisects from the dashboard UI? or big fat warning + suggestions on what to do

dave-2 commented 6 years ago

dave-2 commented 6 years ago

Ping, this is the leading cause of Pinpoint failures. https://pinpoint-dot-chromeperf.appspot.com/job/11a055c0a40000 https://pinpoint-dot-chromeperf.appspot.com/job/12870ef3240000 https://pinpoint-dot-chromeperf.appspot.com/job/119d9c3f240000 https://pinpoint-dot-chromeperf.appspot.com/job/16b84257240000 https://pinpoint-dot-chromeperf.appspot.com/job/12fa2ed3240000 https://pinpoint-dot-chromeperf.appspot.com/job/14b4876b240000

camillobruni commented 6 years ago

Uhm, I have to oppose to the decision to simply disallow bisection on summary metrics (maybe I have a different problem that can be solved in other ways).

For runtime-call stats we do have sub/summed metrics exactly for the purpose of reducing noise. Here is the example hierarchy:

Blink
V8-Total
- V8 C++
- V8 Array Runtime
- V8 ...
- V8 Compilation
- V8 GC
- ...

Usually the regressions on the toplevel V8-Total metric is too small, while you get a very stable and strong signal on the sub-metric. This is the very reason at least in our case, we do want to bisect on the sub-metric. Note that V8-Total is the sum of all V8 metric, and that each V8 submetric is the sum of many small sub-sub-metrics.

I'm currently unable to address the majority of detected regressions manually due to this bug since there are too many V8 CLs in the revision range.

simonhatch commented 6 years ago

Hmm can you clarify that a bit more? In your example why can't you find a sub-metric to bisect on instead of the summary metric? You mentioned that you disagree with disallowing bisection on summary metrics, but your example goes on to choose a sub-metric for the reason that the summary regression is too small.

camillobruni commented 6 years ago

I hope i get this right :).

Some background first: To clarify, V8-Total is the sum of all V8 component metrics (V8-C++, V8-Compiler, V8-GC...), those metrics themselves are sums of the actual measurement scopes from within V8 (for V8-C++ this might be something like V8-Array-join, V8-Allocate-Array...). In total we've have in the order of 200 low-level metrics, too many to track, so we don't have alerts on them and hardly ever want to bisect on that level.

For the next-gen benchmarks we will have a similar structure of metrics for the V8 memory: V8-Total-memory / (V8-new-space, V8-old-space...) / (memory usage by instance type).

Bisection usually happens on the summary metrics over V8 components (V8-C++, V8-Compiler and friends) since those conveniently overlap with sub-team responsibilities.

Do the following statements make sense?

I prefer bisecting on the most specific page if possible to drive down bisection time.
I prefer bisecting on the most general metric possible (if the jump is high enough). We have to check the V8-Total metric to ensure that we don't just deal with a simple category shift (for example from V8-JavaScript to V8-C++)
If the jump is too small I prefer bisecting on the next lower metric. So instead of V8-Total I'd launch a bisect on V8-GC or similar.

simonhatch commented 6 years ago

Gotcha, so to summarize, there could be a lot of per-page metrics to dig through and it makes more sense for you to alert on the summary instead, and bisect on either a per-page metric or a sub-summary.

If we were to switch to per-page alerts, we do have more intelligent grouping available that now that should be able to group alerts from related metrics together, is that still less desirable?

camillobruni commented 6 years ago

We could try how this works on v8.browsing_stories, I don't have a good notion on what would change.

simonhatch commented 6 years ago

@benshayden Is the metric relationship data still hardcoded, or can you fetch it from the diagnostic now? How much work is it to add v8's?

benshayden commented 6 years ago

First, to clarify "removing summary timeseries", I'd like to investigate performance a bit before removing them from the database. It might be unbearably slow to fetch 100 timeseries and merge them -- storing summary timeseries might be a necessary performance optimization. In that case, it might make architectural sense to merge them in add_histograms_queue instead of requiring uploaders to merge them. We'll keep talking about it.

Relationships are still hardcoded in the frontend. I added sparkline tabs for RAILS and Components for v8:browsing* test suites. See status. I'd be happy to hardcode more related measurements sparkline tabs. Which test suites should I add sparklines for, and which measurements are related to which?

simonhatch commented 5 years ago

So update on this, have some bandwidth so I was going to port the summarization into pinpoint so that it can, at least for the time being, bisect on summary values. Quick point about the bisects @dave-2 posted earlier, it actually looks like only maybe 2 of those failed due to the summary issue. The rest failed to run any stories at all.

catapult-project / catapult

[📍] Unable to find histogram #4500