On the Metrics page of the DB console, graphs like "Service Latency: SQL Statements, 99th percentile" plot points by averaging the set of values represented by the point. Using an average hides outliers, which are critically important in a 99th-percentile graph. Giving an option to toggle these graphs to use a max aggregation would be helpful for users and for TSEs and L2 engineers debugging customer issues.
A couple of notes for context, not really addressing the requested toggle.
There's both downsampling and averaging across nodes at play (when viewing cluster-wide metrics). I think the average is used for both of these for the percentile graphs.
On the custom chart page you have control over both.
All of the tsdb options are wrong for combining percentiles (since we don't maintain histograms, etc.). This is part of the reason why I think we should move exclusively to Prometheus for CC.
For outliers, though, we also have p99.9 and p99.99 timeseries, and even more importantly pMax. Perhaps we should surface these more. For pMax in particular, the maximum clearly seems like the right default downsampler.
On the Metrics page of the DB console, graphs like "Service Latency: SQL Statements, 99th percentile" plot points by averaging the set of values represented by the point. Using an average hides outliers, which are critically important in a 99th-percentile graph. Giving an option to toggle these graphs to use a max aggregation would be helpful for users and for TSEs and L2 engineers debugging customer issues.
Jira issue: CRDB-36913