Open YMikheev opened 3 months ago
Can you see the locks following https://wiki.postgresql.org/wiki/Lock_Monitoring instructions?
if you try to synchronize metadata: SELECT start_metadata_sync_to_all_nodes();
select relation::regclass, * from pg_locks where not granted;
relation locktype database relation-2 page tuple virtualxid transactionid classid objid objsubid virtualtransaction pid mode granted fastpath waitstart pg_dist_node relation 16385 16546 NULL NULL NULL NULL NULL NULL NULL 9/4571 27664 RowShareLock FALSE FALSE 2024-08-06 16:54:35.652722+00
because of advisory lock is granted:
You can try to use this query https://wiki.postgresql.org/wiki/Lock_Monitoring#:~:text=information%20in%20pg_stat_activity-,%D0%A1ombination%20of%20blocked%20and%20blocking%20activity,-The%20following%20query
to find the blocking pid.
I killed the prepared transaction and now see: coordinator: the process 9420 "Citus Metadata Sync Daemon" is blocking pg_dist_node workers 0 and 3 (they have metadatasynced = false), the process has started:
after some days nothing happens: pg_dist_node is locked (on coordinator), worker 0 and 3 wait event "SyncRep"
I fixed worker 0 and 3.
there is still a problem with the coordinator. the process "Citus Metadata Sync Daemon" is blocking pg_dist_node pg_dist_node has metadatasynced = false
when I try to kill a process "Citus Metadata Sync Daemon", it restarts and continues to block pg_dist_node.
It is possible to fix this problem?
Hello @emelsimsek.
Can you help me? It is very critical for us.
the worker22 is not active (existing) worker. postgres log: WARNING: connection to the remote node worker22:**** failed with the following error: server closed the connection unexpectedly
You probably want to fix that so citus can proceed...
Yes. Now we have one problem: the "Citus Metadata Sync Daemon" is blocking pg_dist_node.
For an unknown reason, the cluster stopped working. His condition is: Coordinator: nodeid groupid nodename hasmetadata isactive metadatasynced shouldhaveshards 54 11 worker21 TRUE TRUE FALSE TRUE 51 12 worker23 TRUE TRUE TRUE TRUE
pg_dist_node is locked by ExclusiveLock with process without pid (because of that it is not possible to do anything with metadata: meta sync for example ) It is advisory lock - I do not now how to kill that lock ?
Worker21: nodeid groupid nodename isactive metadatasynced shouldhaveshards 54 11 worker21 TRUE TRUE TRUE 51 12 worker22 TRUE TRUE TRUE
the worker22 is not active (existing) worker. postgres log: WARNING: connection to the remote node worker22:**** failed with the following error: server closed the connection unexpectedly
Worker23: nodeid groupid nodename isactive metadatasynced shouldhaveshards 54 11 worker21 TRUE FALSE TRUE 51 12 worker23 TRUE TRUE TRUE
It is possible build the cluster?