cockroachdb / pebble

RocksDB/LevelDB inspired key-value database in Go
BSD 3-Clause "New" or "Revised" License
4.95k stars 458 forks source link

sstable: order range keys consistently in colblk and rowblk encodings #4160

Closed jbowens closed 1 week ago

jbowens commented 1 week ago

Previously the ordering of range keys within a range key block was a bit muddled and undefined. Before encoding a Span of range keys to the underlying RawWriter, the sstable.Writer type previously sorted the span's keys by suffix. However, the row-based sstable writer always serializes RANGEKEYSETs of a Span first, followed by RANGEKEYUNSETs and then RANGEKEYDELs.

Confusingly, the rowblk fragment iterator also sorted keys by trailer when iterating backwards, but not forwards. The columnar RawWriter preserved the order of keys in the span passed to EncodeSpan, and this order was preserved during iteration.

This commit adapts the sstable.Writer type to sort a range key Span's keys by trailer and then suffix before encoding. This provides determinism and matches the ordering of keys produced by compactions which sort the keys by trailer. Additionally, the rowblk fragment iterator is updated to always sort the returned keys by trailer.

cockroach-teamcity commented 1 week ago

This change is Reviewable