cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30.21k stars 3.82k forks source link

kvserver: split key returned from lb splitter equals start key #102136

Closed kvoli closed 1 year ago

kvoli commented 1 year ago

When running YCSB (official, not cockroach workload) - it is common for a range to contain only a single primary key, with column families.

The key finder would often return back a specific key, including the column family - e.g.

key=/Table/104/2/"user2933389304617401955"/7/1

However once the key is stripped down to the row prefix, the split key becomes equal to the start key of the range:

/Table/104/2/‹"user2933389304617401955"›

https://github.com/cockroachdb/cockroach/blob/2c098556b2965b5e73d652579c328823a57df254/pkg/kv/kvserver/split/decider.go#L330-L332

When in fact all keys for the range have the same row prefix: ``` /Table/104/2/"user2933389304617401955"/3/1(l=18191690812.0 r=33516019770.5 c=176643 w=173850.0) /Table/104/2/"user2933389304617401955"/3/1(l=28430071900.0 r=51824659947.5 c=272525 w=192350.0) /Table/104/2/"user2933389304617401955"/1/1(l=96395260.0 r=52255685386.0 c=178804 w=225455.5) /Table/104/2/"user2933389304617401955"/5/1(l=56349411273.0 r=74018673591.5 c=443472 w=239320.0) /Table/104/2/"user2933389304617401955"/10/2(l=46248238229.0 r=22903273734.0 c=235387 w=1360035.5) /Table/104/2/"user2933389304617401955"/7/1(l=2704173814.0 r=2443966877.0 c=17323 w=189631.0) /Table/104/2/"user2933389304617401955"/10/2(l=116399430751.5 r=57343293268.5 c=594670 w=344915.0) /Table/104/2/"user2933389304617401955"/1/1(l=247598020.0 r=127468013682.5 c=433871 w=290524.5) /Table/104/2/"user2933389304617401955"/10/2(l=98272960404.0 r=48614753297.5 c=500724 w=572039.5) /Table/104/2/"user2933389304617401955"/1/1(l=88027078.0 r=45775319934.0 c=156055 w=239185.0) /Table/104/2/"user2933389304617401955"/3/1(l=16854240414.5 r=30900384852.5 c=162681 w=189920.0) /Table/104/2/"user2933389304617401955"/7/1(l=14492207753.0 r=13655351691.0 c=96209 w=309470.0) /Table/104/2/"user2933389304617401955"/9/1(l=58184643182.0 r=40127704585.5 c=334360 w=209900.0) /Table/104/2/"user2933389304617401955"/1/1(l=246090521.0 r=126432292225.5 c=430185 w=568386.0) /Table/104/2/"user2933389304617401955"/10/2(l=32570825726.5 r=16178936468.5 c=166393 w=316945.5) /Table/104/2/"user2933389304617401955"/10/2(l=20666197120.5 r=10346926420.5 c=105773 w=564795.0) /Table/104/2/"user2933389304617401955"/10/2(l=121420242095.0 r=59855074611.5 c=621746 w=332645.0) /Table/104/2/"user2933389304617401955"/10/2(l=83880541863.0 r=41514474480.0 c=425466 w=306779.5) /Table/104/2/"user2933389304617401955"/10/2(l=81626309391.0 r=40388670039.0 c=413419 w=254855.0) /Table/104/2/"user2933389304617401955"/5/1(l=42809132960.5 r=56484040466.0 c=337605 w=201410.0)]› ```

Whilst this behavior isn't new, it does affect the metrics of the split queue - namely:

queue.split.load_based and queue.split.process.success.

It also keeps the load based splitter around, bumping 20 counters on every foreground request - artificially introducing latency when there shouldn't be. i.e. If there's no possible split key give n the sampled keys, there's no use in still recording weights/keys for splitting.

This issue is to fix the metric.

Separately, we should also look into addressing the column family and whether we should be paying the foreground latency cost on a range that could never split.

Jira issue: CRDB-27303

blathers-crl[bot] commented 1 year ago

Hi @kvoli, please add branch-* labels to identify which branch(es) this release-blocker affects.

:owl: Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

kvoli commented 1 year ago

Not a serious enough issue to block the .0 release, we will target a fix for the metrics in .1