Closed marcocitus closed 2 years ago
Supporting replicated tables on MX is almost similar to supporting reference tables on MX. We always use 2PC, serialize modifications and make sure that citus_disable/activate_node()
UDFs gracefully handles replicated tables.
To support replicated tables on MX, I suggest the following:
SHARD_STATE_INACTIVE
. Citus never marks any placement as INACTIVE anymore, everything is done via 2PC if it involves multiple-placements/nodes (#5381).
SerializeNonCommutativeWrites()
citus_disable_node
deletes the placements on the node similar to reference table placements
citus_activate_node
does nothing regarding (under) replicated tables. Instead, the user should call rebalancer to make sure tables are replicated finecitus_activate_node
on another node, in the MX world? This should work fine.
s
vs c
doesn't reflect any difference.I feel confident enough to close the issue. The involved PRs are: #5379, #5380, #5381, #5386, #5392, #5405, #5476, #5469, #5470 and #5486.
For the remaining improvements, we could track via individual issues
We currently use shard resource locks on the coordinator node to guarantee replicas remain consistent and to prevent deadlock that could result from running concurrent multi-shard commands. However, having these locks on the coordinator prevents us from performing replicated (reference table) writes or multi-shard commands from workers on MX tables, including writes to reference tables and INSERT..SELECT commands, which harms the MX experience. It also causes issue #925.
A way to resolve this would be to move those locks to the workers that store the shards, either by introducing a UDF for taking the advisory lock or by using explicit table locks on the shards. These would be sent prior to issuing the multi-shard command. The locks need to be obtained sequentially and in a consistent order to avoid distributed deadlocks, after which the actual commands can be sent in parallel.
An alternative approach is to always route unsupported commands through the coordinator. This could also work for DDL commands. The workers will have to obtain a coordinator endpoint to which to send the commands.