Open anarazel opened 7 years ago
We also had an issue in #763 that discussed multi-level sharding.
@sergeyvm mentioned in that issue that he'd like to have multi-level sharding (vs sharding by one dimension and then partitioning further by a different dimension) because the partition pruning logic in PostgreSQL isn't efficient.
Open pg12 multi-level partition???
See also #1345, which I personally think is a pre-requisite for this.
For several use-cases it'd be very useful to be able to shard (partition) data across several dimensions. The primary reasons for that is that, oh wonder, different partitioning schemes have different advantages, and sometimes the advantages are required to be combined to solve issues.
In particular:
One way to combine some of the advantages here, is to allow partitioning by something like hash(user_id), range(time). If user_id is known (typical for DML, OLTPish DQL), then such statements can be sent to a limited number of shards (or only one if time is also known). For more analytical queries time will usually be known, which'll allow some parallelism for some parallelism for wider ranges and allows more efficient pruning.
Figuring out how to create a good user interface for this seems harder than actually implementing multi-level partitioning. To achieve decent colocation I suspect we'll need good hash/range partitioning, rather than relying on hash/append.
Besides the user-interface challenges, there's also the issue that combined hash/range or hash/append partitioning drastically reduces the likelihood that route executor can be used, which might be an issue for some of the apps that'd benefit from such multi-level partitioning. It might be worthwhile to have an option of hash/local-range partitioning, which forces all the second-level partitions to be on the same node. That'd allow for more efficient deletion of old data, without all of the parallelism benefits.