Open tbg opened 2 years ago
cc @cockroachdb/replication
We have marked this issue as stale because it has been inactive for 18 months. If this issue is still relevant, removing the stale label or adding a comment will keep it active. Otherwise, we'll close it in 10 days to keep the issue queue tidy. Thank you for your contribution to CockroachDB!
Is your feature request related to a problem? Please describe.
The KV/Repl teams want to leverage latencies experienced by KV prober to power key results related to latency SLOs.
The basic idea is to look at kvprober read/write latencies as a trimmed-down version of ^1, and also at things like the coefficient of variation of these latencies over time (for use in "predictable latencies" KRs, etc).
This is all in its infancy, but in multi-region clusters, there is a conceptual problem with
kvprobers
current random selection of ranges.Imagine a multi-region cluster in regions R1, R2, R3 where the round-trip latency between regions is ~100ms, and some tables are replicated for resilience against a regional outage (i.e. global writes), and others replicate within a region.
kvprober
, not knowing anything about topology, will mix probes of ranges belonging to different topologies, in effect ending up with a mash of numbers that is likely dominated by cross-region hops.Describe the solution you'd like
I don't have a solution and I suspect we even need to spend more time defining the problem. Filing this issue to get the discussion started.
Describe alternatives you've considered
Additional context
cc @lunevalex
Jira issue: CRDB-19243
Epic CRDB-39898