Optimize Database Performance with Custom Shard Distribution for Multi-Tenant Architecture

Hi team,

Background

We are implementing a multi-tenant architecture in our database, where data is split based on tenant IDs (Tid). Despite our current setup, we've observed significant data volume discrepancies among tenants, leading to performance challenges.

Current Setup

We have multiple tables serving different metrics, as illustrated below:

Table 1 Tid	metric1	metric2
Tid1	1	1
Tid1	2	2
Tid2	3	3

Table 2

Tid	metric3	metric4
Tid1	3	3
Tid1	4	4
Tid2	4	4

Challenge

Our primary goal is to optimize our database's performance by custom distributing larger tenants across multiple nodes. We use specific functions to distribute table1 and table2. However, we've encountered a data skew issue, particularly with Tid 1 always being stored in Node 1, leading to an imbalance where Node 1 stores more data than other nodes. Like this:

Tid1 is in Shard1 Tid2 is in Shard2

Shard	Node	Group	Size
Table1_1	Node1	1	2
Table2_1	Node1	1	2
Table1_2	Node2	2	1
Table2_2	Node2	2	1

Query

Is there a solution within Citus to address this issue? Our initial thought is to customize shard placement, enabling, for instance, data related to Tid 1 in table1 to be stored on Node 1, while the same Tid in table2 gets stored on Node 2. However, we haven't found any function or feature in Citus that directly supports this level of shard placement customization.

Any guidance or suggestions on how to approach this would be greatly appreciated.

Thank you!

citusdata / citus