Multi-level sharding - Githubissues

anarazel commented 7 years ago

See also #1345, which I personally think is a pre-requisite for this.

For several use-cases it'd be very useful to be able to shard (partition) data across several dimensions. The primary reasons for that is that, oh wonder, different partitioning schemes have different advantages, and sometimes the advantages are required to be combined to solve issues.

In particular:

hash-partitioning over something like the 'tenant_id' has the advantage of great locality, but can yield very large shards for large customers. There's usually no parallelism and it's expensive to remove old data, colocation is very commonly possible.
hash-partitioning over something like 'insert_id' or 'uuid' has the advantage of very evenly distributed data, but locality is poor. There's usually a lot of parallelism, no locality (all shards have to be queried) and it's expensive to remove old data, colocation is often possible.
append partitioned over something like time: It's cheap to prune out old data, queries have a fair bit of locality. Parallelism depends on ingest method. DML is often very limited, because routing of changes is more complicated. It's very unlikely to get colocation, often preventing more complex queries
range partitioning: Hard to use currently, but allows allows to combine some of the advantages of hash with some of the advantages of append based partioning, particularly when using composite keys. Can get good locality, can get colocation, can get cheap pruning. But it can be very hard to get decent parallelism & distribution, due to the lack of hashing of keys.

One way to combine some of the advantages here, is to allow partitioning by something like hash(user_id), range(time). If user_id is known (typical for DML, OLTPish DQL), then such statements can be sent to a limited number of shards (or only one if time is also known). For more analytical queries time will usually be known, which'll allow some parallelism for some parallelism for wider ranges and allows more efficient pruning.

Figuring out how to create a good user interface for this seems harder than actually implementing multi-level partitioning. To achieve decent colocation I suspect we'll need good hash/range partitioning, rather than relying on hash/append.

Besides the user-interface challenges, there's also the issue that combined hash/range or hash/append partitioning drastically reduces the likelihood that route executor can be used, which might be an issue for some of the apps that'd benefit from such multi-level partitioning. It might be worthwhile to have an option of hash/local-range partitioning, which forces all the second-level partitions to be on the same node. That'd allow for more efficient deletion of old data, without all of the parallelism benefits.

ozgune commented 6 years ago

We also had an issue in #763 that discussed multi-level sharding.

@sergeyvm mentioned in that issue that he'd like to have multi-level sharding (vs sharding by one dimension and then partitioning further by a different dimension) because the partition pruning logic in PostgreSQL isn't efficient.

l-we commented 4 years ago

Open pg12 multi-level partition???

citusdata / citus

Multi-level sharding #1346