Open hhhizzz opened 10 months ago
If you wish to have fast joins between table1 and table2, you need to have the shard containing rows with Tid1 on the same worker node. I suggest you do not break the colocation between the shards of the same tenant for different tables.
Some ideas:
by_disk_size
may be a nice strategy for you to try out. You can also test new cost functions and threshold values for your strategy.
Hi team,
Background
We are implementing a multi-tenant architecture in our database, where data is split based on tenant IDs (Tid). Despite our current setup, we've observed significant data volume discrepancies among tenants, leading to performance challenges.
Current Setup
We have multiple tables serving different metrics, as illustrated below:
Table 2
Challenge
Our primary goal is to optimize our database's performance by custom distributing larger tenants across multiple nodes. We use specific functions to distribute
table1
andtable2
. However, we've encountered a data skew issue, particularly with Tid 1 always being stored in Node 1, leading to an imbalance where Node 1 stores more data than other nodes. Like this:Tid1 is in Shard1 Tid2 is in Shard2
Query
Is there a solution within Citus to address this issue? Our initial thought is to customize shard placement, enabling, for instance, data related to Tid 1 in
table1
to be stored on Node 1, while the same Tid intable2
gets stored on Node 2. However, we haven't found any function or feature in Citus that directly supports this level of shard placement customization.Any guidance or suggestions on how to approach this would be greatly appreciated.
Thank you!