cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30.19k stars 3.82k forks source link

Timeseries value magnitudes are greatly affected by temporal resolution #71827

Open bobvawter opened 3 years ago

bobvawter commented 3 years ago

This video from v21.1.10 shows a problem where the magnitude of the metric changes dramatically when you increase the temporal resolution of the chart. At a one-day timescale, the number of open transactions only shows ~800. When you zoom in, however, we see that there are more than 5,600 connections open.

https://user-images.githubusercontent.com/1158548/138332945-43ad3c9b-f378-4259-8c97-64e17642ef96.mov

kevin-v-ngo commented 3 years ago

FYI - @thtruo

@dbist has been running into this issue as well.

bobvawter commented 3 years ago

To clarify on why this matters, I nearly undersized a cluster's provisioned storage by 20,000 iops because I was looking at a week's worth of data on a load-test cluster and eyeballing the maximum peak. I had to revise the estimation upwards after spot-checking the individual peaks.

thtruo commented 3 years ago

Thanks for flagging @bobvawter - it is surprising to see an order of magnitude difference after zooming in a narrower time scale 🤔

cc @dhartunian and @nathanstilwell IIRC we don't do any server-side downsampling, so this suggests to me it's stemming from how we display values based on the selected time range on the client-side. Could we investigate where this diff is coming from?

petermattis commented 3 years ago

FYI, we do server side roll-ups in the timeseries system, though I'm not sure if this is the problem here. My guess as to the root problem is that we're averaging instead of maintaining the max.

bobvawter commented 3 years ago

Another wrinkle is that some graphs should be minimum-trending and other towards the maximum. If you were plotting "disk space available", the minimum value is going to be more useful, while "disk space used" would be more useful as a maximum value. One way to solve this would be to plot error bars or ghost lines around the central trend.

dhartunian commented 2 years ago

@bobvawter we don't have support for MIN aggregation at the moment although I understand your rationale. I've modified two downsamplers on the SQL page to fix this particular issue.

bobvawter commented 1 year ago

The general issue was never really solved for.

sean- commented 1 year ago

Proposed fix in https://github.com/cockroachdb/cockroach/pull/110391

bobvawter commented 1 year ago

@sean- what I'm hoping to see at some point are candlestick charts or at least the ability to edit the charts in place when you do want to look at the metric with various aggregations.