apache / shardingsphere

Distributed SQL transaction & query engine for data sharding, scaling, encryption, and more - on any database.
Apache License 2.0
19.63k stars 6.67k forks source link

[Integration Plan] Set up your distributed PostgreSQL database on Kubernetes #29411

Open tristaZero opened 7 months ago

tristaZero commented 7 months ago

Hello community,

StackGres and the ShardingSphere team are currently in the planning stages of an integration project. The aim of this integration is to enable end-users to easily set up a PostgreSQL Shard cluster (Distributed PostgreSQL database) on Kubernetes with just one command ✨!

Q: What's PostgreSQL Shard cluster?

Here you go: Sharded cluster and Database sharding

Q: StackGres is?

The full-stack Postgres Platform, fully Open Source. Learn more here.

Q: ShardingSphere is?

... Wait, you're in this community now. ShardingSphere is a distributed SQL transaction & query engine for data sharding, scaling, encryption, and more - on any database.

Q: Any progress until now?

ahachete commented 7 months ago

Thank you @tristaZero for creating this issue and the introduction to the topic. Hi ShardingSphere Community!

To give a little bit of initial background: StackGres is a fully open source and advanced platform for running Postgres on Kubernetes. One of StackGres' main goals is to create a set of carefully designed CRDs to expose its functionality, which is fully declarative. Effectively, these CRDs (which are often seen by users as YAML files) become StackGres user-facing API --though StackGres also has both a REST API as well as a Web Console, where all of them can be used in a fully interchangeable way.

Recently StackGres introduced support for sharding, via the SGShardedCluster CRD. Following StackGres' philosophy, the goal was to make it very easy for StackGres users to deploy sharded clusters. Users should not need to understand the complexities of wiring up a sharded cluster, with all the components required (high availability, connection pooling, backups for all nodes, customized configurations, monitoring, etc), neither specific details of the sharding techonology. All those would be abstracted by the SGShardedCluster CRD and the user only offered high level decisions (such as the number of shards or whether they are highly available or not) and let StackGres do all the heavy lifting behind the scenes.

As of now, StackGres supports sharding with Citus, which is well supported now and with a wide set of features (including e.g. distributed backups, non-homogeneusly sized shards, etc). But the goal is to expand this CRD to support other sharding technologies. And work at StackGres has been started to support ShardingSphere as another sharding technology. Getting help and feedback from the ShardingSphere Community would be fantastic.

To show an example, this is a (simple) case of the YAML required to create a sharded cluster with Citus, with four shards, all of them highly available (one replica per shard, so a total of two instances per shard) and two coordinators, also for highly available coordinators:

apiVersion: stackgres.io/v1alpha1
kind: SGShardedCluster
metadata:
  name: cluster
spec:
  type: citus
  database: mydatabase
  postgres:
    version: '15'
  coordinator:
    instances: 2
    pods:
      persistentVolume:
        size: '10Gi'
  shards:
    clusters: 4
    instancesPerCluster: 2
    pods:
      persistentVolume:
        size: '10Gi'

(example and additional documentation taken from StackGres docs)

The goal would be to introduce changes into this CRD to add support for ShardingSphere (and obviously implement them too). The work at StackGres is coordinated via the Support Apache ShardingSphere in StackGres issue. Our issues are public, and anyone may post there, so feel free to jump in anytime! I'm also happy tracking this issue here to receive any kind of feedback and ideas.

BTW for those interested, StackGres is also written in Java, as ShardingSphere, despite being a Kubernetes Operator ;) Here's the source code.

github-actions[bot] commented 5 months ago

There hasn't been any activity on this issue recently, and in order to prioritize active issues, it will be marked as stale.

github-actions[bot] commented 1 month ago

There hasn't been any activity on this issue recently, and in order to prioritize active issues, it will be marked as stale.