cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30.1k stars 3.81k forks source link

ui: std deviation not visible on statements list page #49706

Closed piyush-singh closed 1 year ago

piyush-singh commented 4 years ago

While @nstewart was exploring slow statements in a cluster, he noted that overall p99 latencies were high, but it was difficult to spot the offending queries on the statements list page. After digging into the statement details, he was able to find a query with very high variance:

image

However this isn't visible on the statement list unless the you hover over the statement in question: image

We should explore how we can make this easier to spot without requiring hovering over each statement. This may be a bug/regression on this page from the design update.

cc @Annebirzin and @awoods187

Jira issue: CRDB-4206

dhartunian commented 4 years ago

The scaling here is based on other items in the list so there's likely a separate item that has a really high mean.

Would be cool to explore adding a ratio of stdev/mean (https://en.wikipedia.org/wiki/Coefficient_of_variation) which would allow comparison but I wonder if this would cause more confusion since it's not a measure that folks are as familiar with.

Another option could be to have a ratio past which we should a yellow ⚠️ or something by the statement to highlight to the customer.

nstewart commented 4 years ago

Any reason we can't explicitly show the std dev as a column? Def don't want to suggest a specific design, but if we can't find one that is clear - that approach might be helpful. It looks like we talk about pairs of means and std dev throughout the UI.

Or if we want to roughly combine a mean and stdv into one number, why not start with a p99 in the column? I'm sure we put some thought into this already, but one thing I'm confused by is our starting with "Service Latency: SQL, 99th percentile" in the main metrics page, then our transition to talking about mean latency in the statements page. Why do we make that switch in the type of latency we show?

nstewart commented 4 years ago

@ajwerner 's point of view on arbitrary latencies: https://github.com/cockroachdb/cockroach/issues/49658#issuecomment-636115794

github-actions[bot] commented 1 year ago

We have marked this issue as stale because it has been inactive for 18 months. If this issue is still relevant, removing the stale label or adding a comment will keep it active. Otherwise, we'll close it in 10 days to keep the issue queue tidy. Thank you for your contribution to CockroachDB!