cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30.18k stars 3.81k forks source link

observability, jobs: numerators, denominators and units in progress metrics #67667

Open shermanCRL opened 3 years ago

shermanCRL commented 3 years ago

Problem

Currently, the progress metric of a job is a single percentage number, and doesn’t tell the user what measurement contributes to that percentage. “30% of what?” they might ask.

A common user story is observing a job, but struggling to know if it’s slow vs stalled, or which parts have completed, or what “complete” means.

Desired solution

A facility to record & display a numerator and denominator in the progress of a job, with arbitrary labels (i.e. units). For example, “230MB of 500MB complete”.

This is intended for DB Console UI and end-user observability. (Maybe Prometheus too?) It is not intended as a “functional” metric for internal state tracking, but if it turns out useful elsewhere, great.

I have to believe that when we calculate those % completes, we have a numerator and denominator, right? Let’s let the user see them.

(Further ambition: make it a time series so users can observe slowdowns or long tails.)

Complementary idea:

Jira issue: CRDB-8696

Epic CRDB-32144

shermanCRL commented 3 years ago

Thinking that the structure might something like:

ProgressMetric
  Label string                  // typically units like ‘ranges’ or ‘bytes'
  Numerator float               // units completed so far
  Denominator float             // total units to be completed (may be an estimate)
  LastUpdated time              // last time this metric was updated
  StartTime? time, nullable     // when we started executing this phase

...and at the top level, there would be an array of ProgressMetric.

Having this primitive would allow job implementors to do cool stuff.