cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
29.88k stars 3.77k forks source link

Surface a better latency profile for statement and transaction fingerprint statistics #72954

Open kevin-v-ngo opened 2 years ago

kevin-v-ngo commented 2 years ago

We've received feedback to not only surface average latency but also the Max, P99, P90, P50, and Min latencies for a given fingerprint in each aggregation interval. We surface the standard deviation but the user reported that it was an indirect way to detect outliers.

Ideally they'd like to be able to view our P99 latency time-series metrics (or any other P90 metric), go to the statements overview page at the time period (with persisted stats), sort by P99 latency, and identify the statement fingerprint to troubleshoot.

From there, they'd be able to view fingerprint details (execution statistics, unique plans, contention information, outlier execution details, etc.)

Jira issue: CRDB-11358

Epic CRDB-32139

kevin-v-ngo commented 1 year ago

FYI @dongniwang

We should probably have a similar chart on the fingerprint details page but will defer to you. This is on serverless metrics today:

image
maryliag commented 1 year ago

The new solution should consider the behaviour addressed in #99070

maryliag commented 11 months ago

Remaining tasks on this issue is to improve sampling rate, since currently the percentiles are only calculated for statements that got detected on Insights, meaning 100ms or 50ms