citusdata / citus

Distributed PostgreSQL as an extension
https://www.citusdata.com
GNU Affero General Public License v3.0
10.62k stars 671 forks source link

Optimize Database Performance with Custom Shard Distribution for Multi-Tenant Architecture #7409

Open hhhizzz opened 10 months ago

hhhizzz commented 10 months ago

Hi team,

Background

We are implementing a multi-tenant architecture in our database, where data is split based on tenant IDs (Tid). Despite our current setup, we've observed significant data volume discrepancies among tenants, leading to performance challenges.

Current Setup

We have multiple tables serving different metrics, as illustrated below:

Table 1 Tid metric1 metric2
Tid1 1 1
Tid1 2 2
Tid2 3 3

Table 2

Tid metric3 metric4
Tid1 3 3
Tid1 4 4
Tid2 4 4

Challenge

Our primary goal is to optimize our database's performance by custom distributing larger tenants across multiple nodes. We use specific functions to distribute table1 and table2. However, we've encountered a data skew issue, particularly with Tid 1 always being stored in Node 1, leading to an imbalance where Node 1 stores more data than other nodes. Like this:

Tid1 is in Shard1 Tid2 is in Shard2

Shard Node Group Size
Table1_1 Node1 1 2
Table2_1 Node1 1 2
Table1_2 Node2 2 1
Table2_2 Node2 2 1

Query

Is there a solution within Citus to address this issue? Our initial thought is to customize shard placement, enabling, for instance, data related to Tid 1 in table1 to be stored on Node 1, while the same Tid in table2 gets stored on Node 2. However, we haven't found any function or feature in Citus that directly supports this level of shard placement customization.

Any guidance or suggestions on how to approach this would be greatly appreciated.

Thank you!

hanefi commented 10 months ago

If you wish to have fast joins between table1 and table2, you need to have the shard containing rows with Tid1 on the same worker node. I suggest you do not break the colocation between the shards of the same tenant for different tables.

Some ideas: