apple / foundationdb

FoundationDB - the open source, distributed, transactional key-value store
https://apple.github.io/foundationdb/
Apache License 2.0
14.53k stars 1.31k forks source link

Special case shard splitting for sequential insert workloads #1322

Open alexmiller-apple opened 5 years ago

alexmiller-apple commented 5 years ago

Currently, the logic used to split a shard that has grown too large is to choose a new boundary key that cuts the shard in half. Assuming a vaguely well distributed workload, this is a good choice.

However, workloads that always sequentially append are reasonably common. Most notably: any key that contains a version stamp. In this case, we'd be far better off declaring that the shard should be split by drawing the new boundary as the last currently existing key, and letting the workload proceed into filling up the new, empty shard.

I would imagine that tracking if >99% of write requests are past the ends of one of the byte samples shouldn't be terribly hard to rig up?

xumengpanda commented 5 years ago

The motivation to solve the sequential-appending problem is clear and reasonable.

But what if the workload changes from sequential-appending to some random access later, or what if the sequential-appending detection logic makes an incorrect prediction/decision, what is the bad case? We probably don't want to fall back to the evenly-spreading mode, because that will cause a LOT of data movement.

If the shard splitting can base on the the statistics of data capacity (and access rate) in each shard range, that will be perfect. :)

alexmiller-apple commented 5 years ago

I wasn't under the impression that changing where we decide to split a shard would cause mass data redistribution? Don't we just split ranges when they're too large, and otherwise, not care about them?

I did forget though that storage servers don't know about shards, so having them track "is this key inserted past the end of data in this shard" is harder than I first thought.