citusdata / pg_shard

ATTENTION: pg_shard is superseded by Citus, its more powerful replacement
https://github.com/citusdata/citus
GNU Lesser General Public License v3.0
1.06k stars 63 forks source link

Support range partitioning #75

Open coryfoo opened 9 years ago

coryfoo commented 9 years ago

Due to specific customer demands, we need the ability to partition using pre-defined ranges of values. For us, it is long values, but I suppose for others it could alphabetical, or whatever. A couple of issues immediately come to mind with this problem:

  1. AFAIK, the addition of more shards after the initial process is not supported. This would likely need to change as pre-determining all the ranges would be impractical in many scenarios.
  2. One could imagine a scenario where multiple ranges might want to map to the same shard. I think this is supported in the table metadata structure, but if not, then this could be an issue, too.
jasonmp85 commented 9 years ago

Range partitioning is commonly brought up in the context of time series data. Specifically, users often wish to define a partition "width" and add new partitions for the "current" data as time progresses (i.e. no need to predefine the entire range of time). Another variant is to use a special bucket capped at "infinity" and change its upper bound to something reasonable once it has a certain amount of data.

the addition of more shards after the initial process is not supported

With long values, that pattern doesn't seem to apply. How would you add shards? Presumably if the data can take on any long value, you'll need a shard for every possible value, but then the concept of adding shards doesn't apply. Would you need to "split" an existing shard, instead?

One could imagine a scenario where multiple ranges might want to map to the same shard

As for a scenario where "multiple ranges map to the same shard"… can you elaborate? Do you mean a node will have multiple ranges sitting on it (that's already done), or is it crucial that our shard concept cover multiple ranges (keep in mind pg_shard places many distinct shards on each node, so it's already quite easy to have two shards covering different ranges on one node).