Closed kvoli closed 1 year ago
Hi @kvoli, please add branch-* labels to identify which branch(es) this release-blocker affects.
:owl: Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.
Not a serious enough issue to block the .0 release, we will target a fix for the metrics in .1
When running YCSB (official, not cockroach workload) - it is common for a range to contain only a single primary key, with column families.
The key finder would often return back a specific key, including the column family - e.g.
However once the key is stripped down to the row prefix, the split key becomes equal to the start key of the range:
https://github.com/cockroachdb/cockroach/blob/2c098556b2965b5e73d652579c328823a57df254/pkg/kv/kvserver/split/decider.go#L330-L332
When in fact all keys for the range have the same row prefix:
``` /Table/104/2/"user2933389304617401955"/3/1(l=18191690812.0 r=33516019770.5 c=176643 w=173850.0) /Table/104/2/"user2933389304617401955"/3/1(l=28430071900.0 r=51824659947.5 c=272525 w=192350.0) /Table/104/2/"user2933389304617401955"/1/1(l=96395260.0 r=52255685386.0 c=178804 w=225455.5) /Table/104/2/"user2933389304617401955"/5/1(l=56349411273.0 r=74018673591.5 c=443472 w=239320.0) /Table/104/2/"user2933389304617401955"/10/2(l=46248238229.0 r=22903273734.0 c=235387 w=1360035.5) /Table/104/2/"user2933389304617401955"/7/1(l=2704173814.0 r=2443966877.0 c=17323 w=189631.0) /Table/104/2/"user2933389304617401955"/10/2(l=116399430751.5 r=57343293268.5 c=594670 w=344915.0) /Table/104/2/"user2933389304617401955"/1/1(l=247598020.0 r=127468013682.5 c=433871 w=290524.5) /Table/104/2/"user2933389304617401955"/10/2(l=98272960404.0 r=48614753297.5 c=500724 w=572039.5) /Table/104/2/"user2933389304617401955"/1/1(l=88027078.0 r=45775319934.0 c=156055 w=239185.0) /Table/104/2/"user2933389304617401955"/3/1(l=16854240414.5 r=30900384852.5 c=162681 w=189920.0) /Table/104/2/"user2933389304617401955"/7/1(l=14492207753.0 r=13655351691.0 c=96209 w=309470.0) /Table/104/2/"user2933389304617401955"/9/1(l=58184643182.0 r=40127704585.5 c=334360 w=209900.0) /Table/104/2/"user2933389304617401955"/1/1(l=246090521.0 r=126432292225.5 c=430185 w=568386.0) /Table/104/2/"user2933389304617401955"/10/2(l=32570825726.5 r=16178936468.5 c=166393 w=316945.5) /Table/104/2/"user2933389304617401955"/10/2(l=20666197120.5 r=10346926420.5 c=105773 w=564795.0) /Table/104/2/"user2933389304617401955"/10/2(l=121420242095.0 r=59855074611.5 c=621746 w=332645.0) /Table/104/2/"user2933389304617401955"/10/2(l=83880541863.0 r=41514474480.0 c=425466 w=306779.5) /Table/104/2/"user2933389304617401955"/10/2(l=81626309391.0 r=40388670039.0 c=413419 w=254855.0) /Table/104/2/"user2933389304617401955"/5/1(l=42809132960.5 r=56484040466.0 c=337605 w=201410.0)]› ```Whilst this behavior isn't new, it does affect the metrics of the split queue - namely:
queue.split.load_based
andqueue.split.process.success
.It also keeps the load based splitter around, bumping 20 counters on every foreground request - artificially introducing latency when there shouldn't be. i.e. If there's no possible split key give n the sampled keys, there's no use in still recording weights/keys for splitting.
This issue is to fix the metric.
Separately, we should also look into addressing the column family and whether we should be paying the foreground latency cost on a range that could never split.
Jira issue: CRDB-27303