cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30.12k stars 3.81k forks source link

kvserver,kvprober: topology-aware probing #87263

Open tbg opened 2 years ago

tbg commented 2 years ago

Is your feature request related to a problem? Please describe.

The KV/Repl teams want to leverage latencies experienced by KV prober to power key results related to latency SLOs.

The basic idea is to look at kvprober read/write latencies as a trimmed-down version of ^1, and also at things like the coefficient of variation of these latencies over time (for use in "predictable latencies" KRs, etc).

This is all in its infancy, but in multi-region clusters, there is a conceptual problem with kvprobers current random selection of ranges.

Imagine a multi-region cluster in regions R1, R2, R3 where the round-trip latency between regions is ~100ms, and some tables are replicated for resilience against a regional outage (i.e. global writes), and others replicate within a region.

kvprober, not knowing anything about topology, will mix probes of ranges belonging to different topologies, in effect ending up with a mash of numbers that is likely dominated by cross-region hops.

Describe the solution you'd like

I don't have a solution and I suspect we even need to spend more time defining the problem. Filing this issue to get the discussion started.

Describe alternatives you've considered

Additional context

cc @lunevalex

Jira issue: CRDB-19243

Epic CRDB-39898

blathers-crl[bot] commented 2 years ago

cc @cockroachdb/replication

github-actions[bot] commented 8 months ago

We have marked this issue as stale because it has been inactive for 18 months. If this issue is still relevant, removing the stale label or adding a comment will keep it active. Otherwise, we'll close it in 10 days to keep the issue queue tidy. Thank you for your contribution to CockroachDB!