Open coryfoo opened 9 years ago
Range partitioning is commonly brought up in the context of time series data. Specifically, users often wish to define a partition "width" and add new partitions for the "current" data as time progresses (i.e. no need to predefine the entire range of time). Another variant is to use a special bucket capped at "infinity" and change its upper bound to something reasonable once it has a certain amount of data.
the addition of more shards after the initial process is not supported
With long
values, that pattern doesn't seem to apply. How would you add shards? Presumably if the data can take on any long
value, you'll need a shard for every possible value, but then the concept of adding shards doesn't apply. Would you need to "split" an existing shard, instead?
One could imagine a scenario where multiple ranges might want to map to the same shard
As for a scenario where "multiple ranges map to the same shard"… can you elaborate? Do you mean a node will have multiple ranges sitting on it (that's already done), or is it crucial that our shard concept cover multiple ranges (keep in mind pg_shard
places many distinct shards on each node, so it's already quite easy to have two shards covering different ranges on one node).
Due to specific customer demands, we need the ability to partition using pre-defined ranges of values. For us, it is
long
values, but I suppose for others it could alphabetical, or whatever. A couple of issues immediately come to mind with this problem: