citusdata / citus

Distributed PostgreSQL as an extension
https://www.citusdata.com
GNU Affero General Public License v3.0
10.34k stars 656 forks source link

Improve non-transactional metadata-sync experience considering clusters with large number of shards #7487

Open onurctirtir opened 7 months ago

onurctirtir commented 7 months ago
  1. Investigate how we end-up with large number of locks in non-transactional mode.
  2. Make non-transactional metadata-sync truly idempotent: e.g., the commands that we send to detach partitions is not truly idempotent even if we do ALTER TABLE IF EXISTS DETACH PARTITION. This command guards us against the cases where the parent relation does not exist. However, it doesn't save us in the cases where:
    • the child relation does not exist
    • the child relation is not a partition of parent relation
onurctirtir commented 7 months ago

@aykut-bozkurt / @halilozanakgul / @mtuncer / @vlkncetin, please feel free to add more comments if I'm missing something.

cc: @austuner / @pinodeca on prioritization of this issue.

aykut-bozkurt commented 6 months ago
  1. Those locks that are accumulating on the coordinator are normal. They are not that much even for a huge cluster, so it does not cause any failure. We verified non-transactional sync regularly clears accumulated locks at all workers via many small transactions.
  2. This sounds true to me. We actually need to make sure that anywhere where we propagate "detach partition" is idempotent. It is extra important for non-transactional metadata-sync as it would leave the sync in an intermediate state that should be completed.
aykut-bozkurt commented 4 months ago

I found out one place where we accumulate locks on the coordinator. extern_IsColumnarTableAmTable points to IsColumnarTableAmTable which acquires lock on the relation without releasing it. See below:

if (accessMethod == NULL && extern_IsColumnarTableAmTable(relationId))

We can solve it with a variant of IsColumnarTableAmTable which takes no lock, which is fine to replace above snippet where we already take lock on metadata tables.